Artifactory as an IT Service @ Siemens [swampUP 2020]

Marija Kuester,Service Manager, Siemens

July 7, 2020

< 1 min read

Best Practices for Artifactory Backups and Disaster Recovery: https://jfrog.com/whitepaper/best-pra…

What are the advantages and challenges of setting up Artifactory as an IT Service in a company?

Video Transcript

you
hi guys my name is Maria Costa
I’m IT service manager by Siemens and
today my colleagues and me would like to
give you some of you about motivation
and challenges we have setting Red Frog
as a battery running is the IT service
in our company so three years ago we had
follow situation there were a lot of
developer teams working with the
different development platforms using
the worst a centralized solutions for
hosting their binaries we had teams
hosting the binaries on TFS platform
which is in principle not designed for
binaries hosting we had teams hosting
their binary Sinclair case some teams
use different shares putting their
binary there
some teams hosted the binaries on the
local machines and sometimes found some
urban solutions building shadow IDs all
the solutions had to fulfil some
important company requirements like how
to keep on every secure how to share
banners with other projects how to
fulfill all legal requirements how to
reduce costs on this region and how to
make everything more performant and
available for developers there was
definitely necessary in our company to
establish an unified central manage
platform for binaries hosting fulfilling
all functional and performance
requirements coming from developers and
fulfilling all security and legal
requirements coming from management and
reducing cost of administration today we
are providing a global service within
Siemens making activity available for
developer teams in our company we have
one dedicated team administrating all
auditory assets spread worldwide
supporting developer teams to integrate
as if a tree in their development
pipelines taking care about all security
and legal aspects should be fulfilled in
order to manage the binaries in the
proper way and being permanently in
touch with jail for getting support from
their side
he’s our service in numbers we have
advisory classes in 15 locations fed
worldwide on three continents we’re
hosting thirty or forty three server in
the background we have one team taking
care on whole server setup we have one
supplier rail rock we’re supporting more
than 250 software projects and serving
around 6,000 developers we have 150
million successes per month on our
systems worldwide what are the
advantages of settings as a vittoria
as an IT service we set up a tea factory
as the single banner is held in our
company we established Siemens in a
sauce hub using a tetra platform we have
pretty good overview about third party
banner is being used in our company
scanning then for security and license
for mobilities we reduced shadows IT
because we have one central service used
by many products will cover all legal
requirements for banner is hosting we
cover all security requirements for
binaries hosting we reduce cost using
centralized solution so these are all
advantages we have setting different
products as IT service and my colleagues
our innovation guy andreas meeting and
our service architect in Japan will talk
more about the challenges we have
running a data tree as the ID service in
our company and about solutions we
implemented or are about to implement in
order to keep our service available
performing predictable and secure so
thank you very much and guys it’s your
turn
why after this introduction to our
service and how it looks like I want to
give now some hints and how to set up
your own service and what supporting
technologies you can use for this I
first want to talk about the difference
between the application and the service
experience the application experience is
that what you all know of artifactory
this is basically is the application
great to use and is it good for the
users
this is provided by j-roc however this
is not all that you have as the overall
the experience you also have topics like
the availability of your system
integrations to third-party products
costs or trainings that you provide to
your users and this is an indent your
task to make this great and to talk a
bit about the ratio between application
and service experience this might depend
on your own offering the share of the
epics perience could be bigger or
smaller for example if you consider a
system which has just a small scale with
a few teams it’s a single instance and
you have no legal requirements and only
use the system out of the box it’s more
or less dieppe experience but if you’re
in a situation we have quite a huge
scale with systems worldwide and you
have lots of laws like Export Control or
FDA approvals and an integrated workflow
with third-party tools like source code
repository in your own company it might
not be that important how good the app
really is but how great your services
consider for example our service with
quite a lot of instances worldwide and
thousands of be use or business units
that use it it’s really important how we
set up our service it could be the same
for you and might be that a bad service
offering can outweigh the great
applications so keep this in mind to
support you a bit and give you some
hints I want to do now analyzes of some
supporting tools and workflows based on
some stakeholders for this I want to
pick the developer any project manager
let’s first look at the developer so a
typical developer has quite a few
interaction points with your service
which you might not often think about
like build failures or down times on an
off boarding so that you can use your
system some bugs that he might
experience during the usage
customizations and interfaces that you
have to other company assets and the
support in the training that you offer I
want to take a look in depth at the
failures and downtime so a typical
scenario would be that a user has a
broken build in the archived defect step
of a CI CD engine and he will ask
himself
is artifactory not really working how do
I find this out and in the end if you’re
a service provider it means that you’re
guilty until proven innocent so people
will always assume that it’s your fault
if there’s something is not working so
gather metrics and data to provide a
clear picture whether everything is up
and running and everything is working
fine you have to prove this it’s on you
and also define the term running system
so is it just a ping on the machine this
is enough for your customers
is it an HTTP 200 on the main page do
you have maybe as small samples a file
that you will upload and download so and
you say now the system is working or do
we have maybe a defiant service level
agreement where you have a use case and
say we have to upload this artifact do
we have some replications and XS right
changes and because this customer proved
now let’s quickly discuss a possible
setup how you could measure these
results typically you should check this
quite regularly I would propose at least
once per minute because you always want
to know the current status and you don’t
want the old information you could for
example run a job in Jenkins which uses
a normal artifact or user and which
logins locks into all systems in this
case we have three and you should
connect as similar as the user as
possible for example use J CLI and you
should also paralyze this for speed
because you want to check out them you
can then for example perform an upload
and download of a small size small file
compare the check sums before and after
and maybe change some properties on the
system so that you interacted with the
api’s then measure all these results and
compare them against the defined good
state so for example you could say the
system is always healthy if the results
are fine if you have three consecutive
fails it’s becomes unhealthy if you do
it like this you will still quickly see
any issues and you can ignore false
positives like networks loops for
example be also sure dead not only you
have these results but you can show them
and easy to understand over
view to your customers like a landing
page where everyone can access it and
one traffic light per server which shows
green or it and also Lochte results for
long term analyzes because you want to
show that your uptime is for example
ninety-nine point nine percent according
to the SLA another stakeholder would be
the team lead or a project manager so
also they met have quite a lot of
interactions with the service for
example they often want to know the
resource consumption that their project
self because they have to pay for it
they have a project onboarding in case
of new projects they want to go be up
and running quickly they have to
maintain the project for example check
permissions or add new users and they
might be an innovation driver because
they want their project to succeed and
always want to know if you provide new
features in this case let’s discuss the
project creation and maintenance so if
you want to provide a service you
require additional informations from
your customer so you can do a proper
service delivery like who is the owner
of a repository who will pay for it and
how we should be contacted in case of
issues like a hacker attack who is
allowed to grant access to the project
resources and do we have any special
legal requirements that we have to cover
it might be Xcode for expert control
there might be FDA approval there’s a
lot and you have to work together with
your customer for this information you
cannot provide this on your own but if
you have this enforce it could be a
baseline for future automations and
easier easier interactions let’s discuss
a possible setup how you could realize
for example an automatic project
creation and maintenance so a good
approach would be to create a small UI
for the users where they can login and
enter and request new projects so that
you do not have any manual interactions
in this case you should gather all the
information all the information that you
need depends always a bit on your own
setup and you should store them on a
separate DB do not put it just into the
artifactory because you needed globally
not just for one instance and
what you could then do if you have these
informations you could automatically
create on all your instances via the
REST API all the repositories you could
automatically or at the replications and
you could for example create an Active
Directory group in your corporate Active
Directory handover the group ownership
to the user that requested it so that he
can now maintain all the users and it’s
not on your central service to maintain
this and after you’ve done created the
repos you should also add them to your
regular monitoring jobs because all the
your end users need this information
like usage patterns of the repositories
or the costs that you create or security
issues like public anonymous writes who
want to report this information to your
users so you should also add them here
yes and after this short overview of the
term possible service setups I want to
hand now over to England so hello
everybody welcome to spam pop my name is
Amira payment as Maria introduced me I’m
a service architect on this Siemens
artifactory service and in the next five
minutes I would like to introduce to use
some growing pains that experience and
some solutions that we came up with so
on the screen you can see again this map
I would show another representation
representation of our servers which just
pictures of servers to illustrate what
problem we had once basically some
servers became unresponsive for minutes
or maybe just tens of seconds but the
strange thing was that there was no
indication at all in our monitoring
system what can be the problem
so clearly or monitoring solutions which
were standard stuff like CPU and so on
you know storage and others like that
were not sufficient so I would like to
show you what other monitoring solutions
we installed to circumvent this problem
and you see here HTTP threads excess
threads background workers daily
connections JVM
these are all monitoring’s that are
built on the java virtual machine and
being data with the we read these with a
GX client which is for our cases you
look here but there are other great
clients is where that you can use and as
you can see we now can monitor it we can
set alerts we also have this health
check which is basically a standard
script with upload some files download
some files and do standard tasks and
measure its time so this monitoring is
really essential for service at this
scale because without it we wouldn’t be
able to function as you see we had
before that sporatic problems and by the
way it took several weeks it together
with J frog to find out that it was a
problem with running out of the HTTP
thread pool that was enabled in in
Tomcats another occasion there where we
could use this data in a great wave
where in wonderful clusters
again it was unresponsive for some time
and with this HTTP thread data we were
able to find out that one of a node one
of all the nodes you got all the real
requests as you can see there were a
high number of requests for some other
nodes as well but they were even based
replication threads and as the load
balancing scheme were based on the
lowest number of threads it was a
problem and of course we adjusted a lot
balancing and solved the problem but
these experiences come with growing and
and it was great to have the G focus
port and also without these java virtual
machine-based monitoring’s we couldn’t
function so this was some problem that
we had to overcome I hope maybe you get
some ideas from this and I would like to
show you something else we have these
servers and they generate looks as all
the servers all the time
in many applications it’s not really a
big interest for many others they just
look at some logs maybe once a year or
they just are hide them for for
traceability but for us it’s really or
daily life to dig logs I tell you an
example for example one of the users
come to us and say hey my files are not
replicated to the remote location what’s
going on and then we had to go to logs
and see if he really uploaded the file
maybe just another file name mistake on
his part
maybe the replication ran on an error
and failed maybe it’s still running so
we really really need to go to dig in a
dog looks and be used to a central log
an emetic solution again it’s not a big
deal it’s not reinventing the wheel but
it’s something we couldn’t relieve
without and maybe your solutions will be
interesting to you we use elasticsearch
I know that there are great tools out
there as well and like Splunk
like sumo logic but really it enables us
to function for example I look at last
50 minutes of access logs let’s see
what’s going on and you can see we see
all the servers at once we see all the
actions of course we have we track all
the fields I just show you this because
me I don’t want to show you some
confidential data but you can filter in
those stuff and you can search stuff it
really enables us to to live and quickly
dig into logs but it also enables some
great insights for example we have this
dashboard where based on these logs we
see interesting data maybe for the
management but also to see the big
picture like what number if with
requests are coming to the total of four
systems you can see the weekends are
less busy than the weekdays
you can see also the low distribution
between the servers and some big numbers
it’s great to see the big picture or we
have this other tool other dashboard
where we track the number of requests
weekly on the servers it’s a great tool
to see if there are big shifts in
loading or in our system like like one
of the server is getting more and more
loads and it’s for example coming up
many places like this screen server here
in this case you might go look into the
hardware is it sufficient or maybe you
need to increase something and another
big benefit for us is these top arrows
the reports the top 5 errors maybe here
and it really can help us
proactively find big troubles in your
system again we couldn’t function at
this scale without these solutions I
hope it was useful for you if you have
any questions feel free to contact me
during this session or maybe later I
think my contacts are up there
so thank you for watching and have a
great time at swamp up

1:12:48
NOW PLAYING

はじめての JFROG PLATFORM
JFrog
54 vie