From Developer to Device with OTA Update – Eystein Stenberg, Mender.io

In this session, we will walk through how Mender Hub, Artifactory, and the Mender over-the-air (OTA) update manager can provide turnkey CI/CD enablement for connected device products with OTA capabilities to serve the full lifecycle of IoT.
We will introduce the Yocto Project, which is a build system that creates a customized embedded Linux with software and hardware support for the specified target device. It is the most popular approach to running Linux on IoT devices today.
Mender Hub contains a community repository for supporting over-the-air (OTA) updates on a wide variety of different devices, most of which integrate with the Yocto Project.

CI/CD integration with Mender Hub creates disk images and Mender Artifacts automatically built, tested and uploaded to JFrog Artifactory. From JFrog Artifactory, select device images are automatically uploaded to a Mender server, which deploys them to devices.

This creates a complete end-to-end loop from development sources, including application software and complete Linux customization, to devices. This not only speeds up development cycles as complete builds can be tested frequently, it also ensures devices can be updated remotely over-the-air once they are released to the field.

Video Transcript

Okay everybody, thanks for stopping by. It’s time to start. How many of you went to the keynote with Kit today? The last one? Most of you. So Mender is one of the tools that he showed in the architecture of the autonomous car demo. You can also see it upstairs. So it’s a deployment solution. And my name is Iceland Stanburg and we’re going to talk about CICD pipelines and how that works in the IOT world.

You might’ve heard about this story, but there’s quite a few of them so they sometimes get missed. But this is the story about Fiat Chrysler. We’re updating the infotainment system in the car. And what it turned out to happen was that many of these cars, they got into a constant reboot loop on the infotainment system where you would see your camera, you would adjust the heat or change the music. So they basically become unusable. So you can see this tweets you connect cares that our engineering teams are investigating the cause and working towards the resolution during this stressful period. So the question is who of you would like to be part of that engineering team? So this is what this session is about, like how can we do this in a better way and what does it mean in IOT and automotive?

Quick introduction. My first name is similar to Einstein, which I think is an accident, but it makes it easy to remember that way. I’ve worked about 10 years in systems management, both in the clouds data center space and also in IOT now with Mender, which is a over the air updater. So it allows you to deploy software from a central location to IOT devices in a robust and secure way and it’s also open source. The big point is that I think given that, well you’re here, you probably, I’m preaching to the choir, but I think you should use professional tools for the CICD pipeline because as you know, when bad things happen there, things don’t move forward and it could lead to also customer issues if you deploy bad software or deploy software in a bad way, as we saw earlier. Also you would save development time, doing it this way.

If you look at the IOT over simplified development process, this is how it looks. So we’re doing prototyping, so this is typically on some Raspberry Pi board. Maybe it’s a [inaudible 00:02:58], some development board where you can easily get your code out and it has all the hardware features you need. Then you enter into production design. So now at this point you’re looking at do I need wifi on my, the board that I will sell or the product that I will sell to my customers? Do I need some other network connectivity? Do I need the HTMI outputs? And why are you doing this? Why don’t you just use the same board? Well, the reason is that you’re going to save costs. So if you can save 50 cents on removing the wifi module, hardware module, if you don’t need it and you’re making a hundred thousand boards, that’s $50,000 if you’re making more, that’s more obviously.

So this is why you enter a kind of this production designs stage where you tried to strip down the costs. And that’s also where you start to look at the operating system, what kind of software do I have, how much storage do you have? So you go through them, I guess as most technology development happens, you end up typically in some kind of released deadline panic. That’s what we’ve seen at least. There’s a pressure to get the product out and there’s also manufacturing involved here because in many ways you think probably of IOT as the next generation for technology or there’s a lot of talk about it in Gardner and so on, but in some way it’s a step backwards as well. And the reason is that now you’re dealing with physical products again, so you have to work with like assembly lines and set up relationship in China maybe to manufacturers things in a cheap way. So that’s something to keep in mind. And you have to align this and plan it well obviously for going to mass production and that means when there is a deadline for when you can actually, or there’s a pretty hard deadline usually for when you have to finish the development.

Of course, then as engineers note that there will be bugs as well, so you know that when you entered this the end of the cycle and what typically happens, we have spoken to more than 100 embedded developers before we started to create Mender, but what typically happens is that you create some kind of backdoor, right? So I can update the device in some simple way. Maybe it’s SSH is open and so on, so it’s like a quick way for you to be able to deploy new releases that’s not very planned out. Of course, this is where you should be thinking about CICD during the design phase. Definitely that’s part of what JFrog is doing as well.

The purpose for doing that is that you can get faster iteration cycles. You have a more robust pipeline. so there is maybe a little bit of upfront investment, but if you design it early, you can use it during tests as well. You can set up a test environment and you can deploy new software there, new releases, maybe nightly builds and make sure that this works during, during the entire product development life cycle. And of course you can use the same system once you go mass market. And you would avoid this recalls and the stress as you saw in the first picture there, the engineering team would experience.

Yeah. So how many of you would say you’re mostly involved with like cloud infrastructure or yeah, clouds development applications? Yeah. How many of you are involved with IOT or embedded development? Okay, one, two. So yeah, most of you are working on the cloud side. So for you, I guess this is a quite familiar picture. So the CICD pipeline, you would have some developer commit some codes that would end up on a CI server. You build a Docker image out of it and then you have some integration with Kubernetes. This where you will pull that built image from the Docker registry and deploy a new pod in Kubernetes basically. So that’s kind of the state of the art in the cloud or data center world today and web services. So what you might ask is, can we just repeat this? Now we’re working with IOT devices. Can we do this the same way?

Before we answer that, let’s look at, I guess this is the most simplified form of the building blocks. You would have some source code obviously. Otherwise there’s no point of it all. Then you would need some continuous integration service, so Jenkins or one of these tools. What it does is pull together the different sources obviously. It builds them, you can test them and then you get some kind of build artifact. So that could be a download of it if you’re doing just continuous integration, but if you’re doing continuous delivery as well, you would need a deployment system as well to how do you get this software out to your customers. And regardless, you would need some way of doing this even if it’s not fully automated. Otherwise, there’s not, usually not that much point in developing the software in the first place. You can of course split this a bit more advanced if you’re doing high scale, if you have a large scale of devices, you can first deploy to some kind of staging environment, then you can deploy it to some through your happy customers who likes the newest features. And then you go to the entire environment. So if you’re having a million devices out there, you probably don’t want to deploy it to them all at the same time. So this is just one way of mitigating the risk of deployments.

So if we could dive one step deeper and what this source code in IOT, so if you’re using cloud web services source code, it’s typically the application, right? So you have some service that runs inside Docker or you have, yeah, it could be like a set of web browser files that the browser will display or some kind of a service that provides an API. So you think about application, but if you look at IOT source code is actually a lot more than the application because now you’re delivering this entire device, right? So it’s a piece of hardware and you have to have the full stack of software on that piece of hardware in order to run your application. And this is where it gets a bit complicated in the IOT to be honest, because there’s so many choices and so many vendors, it’s a very complex ecosystem. If you look at the bottom layer, you definitely would need some operating system. So if you’re using Linux, you’ll probably want to look at the Yocto project, which Kit mentioned in this keynote as well, which showed [inaudible 00:11:02] great Linux is based on, for example. Then on top of that you will need some board support package as they’re called. This is basically a set of drivers or a way for your operating system to support all the peripherals and the and enable your device to boot basically.

So yeah, display drivers, network drivers. So these support packages are created by the device manufacturer. So whoever you’re buying the hardware from typically. And on top of that, you’d need some system configuration. Okay. So how do your device boot, like what’s the boot parameters? How do you start your application, for example, and then you have some run time libraries as well as probably that your application needs depending on, yeah, what language it is and so on. And all this is source code typically. So you can see it gets a little bit complicated. Just a quick note on the Yocto, how many of you have heard about Yocto or know it a little bit? Okay. About half or one third of you. So it’s a way to build a Linux distribution. So people frequently confuse it with a Linux distribution like Ubuntu or [inaudible 00:12:25], but it’s rather a Linux distribution builder or as they put it, it’s not an embedded Linux distribution.

It creates a custom one for you. And this is probably the most popular way to run Linux in devices in terms of the number of devices out there with Linux. If you look at how many are built using Yocto versus other operating systems or other Linux-based operating systems, Yocto is probably the most popular. It’s hard to get exact stats on this obviously, but yeah, it’s very widely adopted. But now you’re building Linux from scratch, right? So you have to get system there, the Linux kernel, like all these packages with boot loader from source code and you’re going to build it all. So this can take hours, obviously. You have to fetch it from the internet and then you have to ComPilot and there are some, yeah, for that board or the board architecture. That’s your targeting.

So how can it look? So if we put all this together, you can see some logos here. If you look at just a continuous integration piece, this is how it can look. So you have the application sources configures BSP software system software as we mentioned. You can see like these are from different companies, all of this components. So the only thing that’s coming from your company is basically the application source and the configuration of the system probably. And this could be different types of developers. So doing the system configuration probably you would need more low level skill sets, know a little bit about boot loaders and drivers and partitioning and this kind of things. Application developer probably doesn’t have to know all that. The BSP can come from, yeah, your board vendor, so some big ones. NXP, Samsung and Video Qualcomm, you would put out in all these sources into your CIS server, which can be based on Yocto and Jenkins for example to execute it all. And then you get the binary outfits, which is also not just one type. It can be like a full image and they can be a package or set of packages, containers maybe, depending on how you run your application.

So hopefully this convinced you a little bit of a building software for IOT devices is quite complicated. You have different inputs from different companies and you have also a lot of different types of outputs so it looks a little bit differently than the cloud case. Then you have obviously the last part was the deployment where you use Kubernetes, get a new pod pool from Docker registry and so on. But you cannot do this. So I know there are some projects that are looking at running Kubernetes Lightweight on IOT devices, but I don’t think this will be the way, because the IOT environment is so much different than the cloud environment. And I’ll tell you a little bit about it, but it’s very intermittent and unreliable. It’s not like your easy to instances and there’s different types of software to deploy. So I would say that it’s not possible to reuse the same tools and processes because all the building blocks we looked at are different. The sources are different. The integration, continuous integration system is different. And also the way you’re doing deployments needs to be different. And why is that?

So if you look at the IOT environment and the properties that are important for running software and deploying software there, you probably have a sense for this, but they’re very remote so they could, like the keynote mentioned that could be cars. So where are the cars? They’re across California. They’re across the US, they’re across the entire world. And what automotive is struggling with is expensive recalls obviously. If you have to bring all these cars back, this is going to cost a lot. But that’s just automotive. It could be in agriculture, it could be sensors that are all over the earth to measure the moisture in the earth so you can optimize the fertilizer, right? So if something goes wrong and you have to get to the device, probably are going to be bankrupt if you have lots of them. And then at the same time, they’re expected to last for long time. So a car, what’s the expected lifetime of a car? Maybe 15 years or something like that, which is different than like the, yeah, the cloud infrastructure server and then so on. You replace those every two, three years maybe.

And then of course the risk is also added to by the fact of the unreliability of the environment. So you have the power situation first. So that’s the first one you’ll hit if you’re kind of doing software updates for the first time. Because some of these devices heavily depends on the product, but some of them are running a battery and when you’re doing a software update you’re draining more battery. And what happens if you just lose power during the update process? So you boot the device again and will it boot or is it going to show the end user application? It could also be unplugged, a car, you can turn it off or on at any time. Then we have the network situation as well, which is I would say around two thirds of it is wireless. So then you know wireless connectivity, you can’t really rely on it at all times so you can lose connectivity at any time. The bandwidth is also low, different from the data center. I don’t know if it’s like 10 gigabit, that’s the standard right now and it’s insecure because of the wireless nature. You must or you should assume that somebody are able to observe the packages that are going on there and if you’re doing a software update, what are they going to see? Can they influence it or can they see what you’re updating?

These are things you have to think about more in the IT environment at least. Just a quick note on the network. So often we think, okay, technology is going to evolve, right? So network is going to be faster and we will get more storage and eventually these IOT devices will become like small data centers, right? So we just wait a little bit. Moore’s law will come into play and, and we can use the same tools again. But this is definitely not true for the network I would say. And the reason is that there are very different use cases. So if you think about the kind of a, it’s not really IOT device but a expensive phone. It has fastened at work. So 5, I don’t know what’s the throughput on that, but maybe some of you do, but it’s pretty fast versus the typical IOT device.

Maybe relying on the older 3G standard or a different type of LW pan networks is low range wireless network. So my point is that the phone will, the use case for the phone requires fast network because you want your users, your customers to be able to use Instagram, for example, to post high resolution pictures or look at YouTube or browse the web and so on. So it makes sense that this, we’re kind of pushing for faster and faster network and in phone case. But if you look at the IOT device are you going to watch YouTube there or post pictures of cats or whatever you want to post? No, you’re not. So the use case for IOT devices to take maybe a simple data point from agriculture for example, and send it maybe once a day. So it’s 50 bytes a day. But what’s important is that you have high degree of connectivity. So you want that data point to arrive, but you don’t care if it’s really fast or not, as long as it happens. So connectivity is more important than data speed in IOT. So that’s why I think the network will continue to be slow in IOT case.

So we’ll dive a little bit deeper into the OTA part because yeah, that’s what we’re involved with. So I know this very well and I think it’s quite an interesting challenge as well. And hopefully you’ll think so too. But the default case is that people start thinking, okay, when I’m updating the devices, how hard can it be? Right? So just going out copies some binaries from one place to my device, use curl and create some scripts and it will be fine. But what we found, I think this sample is from about 30, 40 embedded engineers. Those that actually did it. So about half of the IOT devices that were built at this point, it was a couple of years ago, I think. Half of them didn’t have a way to be updated. The other half they did and the way it was homegrown. So that’s also what motivated us to create Mender, because everybody’s reinventing the wheel here. And what people found is that after doing it, they would do it once and it would take 6 to 12 months probably to get a decent solution for deploying and nobody would want to do it again. So it’s costly and time consuming and of course you get into some more interesting issues down the road with security and the robustness and these kinds of things and it’s not really what you’re developing in the first place.

So in terms of requirements for over the air updates, so this obviously doesn’t relate only to Mender, but how are you going to do over the air updates for IOT in a good way? The most important I think and probably frequently overlooked, although Kit did talk about this part, it’s the robustness. So if you lose power, you want the device to come back up. If you accidentally deploy incompatible software to a device, you don’t want that to be allowed. You want to build the roll back if anything goes wrong and you want to have the ability to run some sanity checks, not just that the device is booting or the kernel is loaded, but that the end user application is actually running. Otherwise, you want to roll back again. So there’s a lot of stories around there, so I could spend a lot of time on use cases or case studies.

But there’s one example here, there’s a smart lock device. It was recommended by Airbnb. They had a software called lock state seven I, so that was for the new generation of the locks and it was deployed to the old hardware or kind of the old revision. And the result is that the lock didn’t work so nobody could get into their house or Airbnb guests were locked out and they had to send it back to the manufacturer and get the replacements, which would take three, four weeks, which is a bit annoying if you can’t get into your house.

And then of course you have the security. So I mentioned this before, wireless. So how are you doing communication using TLS, something else that’s well-proven. You get into key management issues, how you identify devices and how you exchange keys, rotate keys, maybe. Authenticity code signing is another big one. So do you know that the code that you’re about to deploy is actually coming from your developer or from a trusted source or could somebody have modified this anywhere in the chain? Basically in the CICD chain. So Fiat Chrysler is an example of the last one. They did not do that code signing properly. So this, how many of you have heard about this story of Fiat Chrysler maybe three years ago? So just three of you. What happened was that some researchers took a look at the Fiat Chrysler car and their goal was to try to find vulnerabilities and figure out how the stack worked.

Nothing of this was documented, obviously. I’m probably using quite old technology and I think they spent some, there were two researchers and they spent I think a year on this problem or something like that. And after that they managed to actually take over a car remotely so they could drive and drive off the road, stop the car. So it’s fairly scary story. There’s some fun videos about what they did with a reporter driving the car if you want to look it up, but they failed to verify the authenticity of the updates. So the researchers were able to deploy new software that were not coming from Fiat Chrysler and that’s how they could take over the car or one part of it. It’s a long chain of events here.

And then there are some other requirements you want probably for the OTA update manager needs to support your operating system that you’re using, the hardware that you’re using and able to integrate with your existing development tools, what kind of language you’re writing your application in, shouldn’t matter and so on. And then easy to get started because as I mentioned in the first slide, usually when you start thinking, or most people start to think about over their updates and CICD for IT, it’s too late. So you don’t want to spend a lot of time on this probably. And then of course you have bandwidth. So for automotive you have, I think 3G is probably the most common way to be connected and you have wifi as well, but about one third of IOT devices are on some kind of slow wireless connectivity. So you need some support for that as well.

How can you transfer updates to low bandwidth networks? So there are some technologies that can be used for that. And then of course there’s the downtime during the update process. So is the device is going to be unusable when the update is happening, like the story of the car update that Kit mentioned as well. You would sit one hour and just wait for it. Or can the installation happen while the device is being used? And then maybe it’s just a reboot that it takes to actually apply the updates where the downtime would happen.

And of course you want some way of being able to manage the updates. So unfortunately the, at least this very common way to do it is with a USB stick. But this gets quite annoying if you’re trying to do continuous deployment to send out a lot of people with USB sticks. So you need some kind of central management server or some way to to deploy this as at mass. And with that there’s a set of other use cases as well. Like you want to be able to group the devices by customer for example, or by people in California so you can reduce the risk of any failures. You don’t have to deploy it to everybody at the same time. So this is more advanced form of that is called campaign management. And then of course you have reporting issues where, okay, I just did a deployment, did it work or not?

How many devices did it work for and what happened to those where it did not work. So you need log management and diagnostics as well. So hopefully as you can see, this is not a very simple problem. It’s not just about copying a byte from one place to the other and hoping for the best. So yeah, this is a bit more detailed, but this is kind of the workflow that we ended up with in Mender and that based on all this requirement, what is the updater supposed to do on the device? So before we dive into this, again what you think about when you start this process. Okay, how can I deploy updates? You think about download, which is a top and installed probably. So this is easy, right? But then you notice after a while that there is a lot of other things you need around that as well.

So first you will need to detect update this there. How do you do that? Maybe the user wants to do an update or it’s automatic based on the server having a new version, which is more likely if you need a compatibility check. So hardware and software, are they compatible? Am I trying to run Arm software on the X86 device, something basic like that. Of course, you need to download it through a secure channel. Do an integrity check. Did something happen due to connectivity issues? Did somebody alter it? So I need to check the signature in authenticity. Maybe you encrypted it because there are some confidential information you need the decrypt it and extract it depending on the format of the package. Maybe you want to run some pre-install actions because your application is now using a different format for your configuration file or you’re doing some migration. Finally you are getting to do the installation and then maybe there’s some posting selections.

You restart either the application or the device when you’re done for applying the update to the end user. And then you need to do some sanity checks because is the application working or not. That’s quite an important question after you have deployed and update. Otherwise you get a lot of angry customers. And then depending on that, is it working? That’s fine, but what if it is not working? So how do you recover if it’s not working and this is where you’re getting to roll back. Okay, so do I have two copies of the software where one, like the latest known good version, this is what Mender does as well. So we do have automated roll backs, but regardless you need some way to recover. And this is one reference pipeline. So as Kit mentioned as well, there is a demo at the automotive booth, so I filled in basically the technologies that bring used there to give you a quick overview over how it’s all fits together.

So they’re using Raspberry PI. There’s something called a Mender hub, which is a community driven project to add support for a lot of boards, over the air update support for a lot of boards. So there’s a board support package available there for Raspberry Pi. They use Mender as part of the system software as well and they use Yocto automatic grade Linux in particular, which is I guess you could call it the flavor of Yocto and Lava, which is testing framework because testing also becomes quite interesting when you’re dealing with hardware. It’s not just starting a Docker container and running the test scripts anymore. It goes, you have to know, given all this code, will the device boot and how can you test that? You need the device and you need to put the new software in the device, re-imaging it with the new image, and then you need to boot it. So there’s a lot of hardware and we’ll learn how can you kind of get underneath the operating systems. So this is quite complicated. And then this is where Lava comes in as well. They have ways to test or it’s a test for [inaudible 00:34:41] where you can actually test more software that’s closer to the hardware, like the kernel and things like that.

So yeah, so then they upload the full images in Artifactory [inaudible 00:34:54] in Xray, you saw that probably also during the video and there is an integration between JFrog Artifactory and Mender. So that as you know, there’s some workflows in Artifactory against set up in order to set a given release to be release ready or given artifact to be release ready. And that’s also where you can upload software with an integration demander. And then finally they have the RC car at the bottom there where you have a Mender client that will pull from the Mender server. So that’s my last slide, and so I really hope that you learned a little bit today about IOT and that you will think about maybe not trying to reinvent the wheel and rather try to use a proven components. And I would also advise you to go and look at the automotive booth upstairs from JFrog if you want to learn more about the reference implementation of a CICD pipeline. I was also told that we are not supposed to ask questions here, so we are going to go upstairs, I think. Yep. Thanks for coming.

Try JFrog for free!