The next Shipyard Session E2E Testing Before Merge will be held on Wednesday, April 27th at 2pm EST. We’ll be joined by special guest Amrisha Sinha of MaestroQA to hear how her team has successfully implemented end-to-end testing using CypressIO, CircleCI and Shipyard. Register here and make sure to bring your questions!
The buzz around ephemeral environments has been growing over the last year and it’s a big part of what we are working on at Shipyard. We decided to start the Shipyard Series (formerly known as…shudder…webinars) to help surface the power of ephemeral environments and how they can directly benefit software and product teams.
For our first session, we focused on why teams should be incorporating ephemeral environments into their SDLC and how they can pragmatically do so. We are sharing this session on our newly created YouTube channel and will work to serve as a resource, advocating for faster, safer application development through ephemeral environments!
Summary of things we covered:
- What is an ephemeral environment?
- A brief history of containerization and virtualization.
- Why the rest of us (not just Google) can use ephemeral environments.
- The role of modern DevOps teams and the resource constraints on them.
- Modern development pipelines and workflows.
- A few immediate benefits of incorporating ephemeral environments into your existing workflows.
- Challenges around implementing and maintaining ephemeral environments.
- Obligatory Shipyard Demo!
As always, check out ephemeralenvironments.io for your guide to getting your organization ready for ephemeral environments!
Don’t forget to register for our next Shipyard Session!
We are providing the transcript below as well for those who like to read!
Ephemeral Environments: “Why and how?”
0:03 Benjie: We’re going to get started-
Ephemeral environments: why and how? We’re going to talk about this paradigm that is starting to get a lot of momentum, why that makes sense, and and how you can do it.
I’m Benjie, CEO and Co-Founder of Shipyard. I’ve been building DevOps pipelines with my Co-founder Peter for a very long time. We were high-priced Kubernetes consultants and the initial version of Shipyard was actually built as an exoskeleton for us to make to make our lives easier, because we kept building the same thing over and over again for every organization.
We’re very fortunate to be joined by Ashley, who’s going to be moderating a Q&A at the end, so please feel free to ask questions in the chat.
I’m going to activate this poll- it’s just to let me know what your favorite color is. There’s a reason for this I promise. You guys can all vote and we’ll leave that open for for a while.
1:32 Benjie: So the first thing is, “what is in a ephemeral environment?” The way that we define it at Shipyard is a short-lived, fully encapsulated deployment of an application.
It means that your application is fully independent; it has the code; it has all your services. It has state, it has all these really important things that you need to be conscientious of.
Challengingly, it also has third party dependencies, so there’s no dependency outside of it. You can do everything you can do with production with an ephemeral environment.
2:13 Benjie: Before we dive too far in, let’s go for a history lesson.
A lot has happened in the last two decades of computing. The big turning point for all of this was around 2004, when Intel added hardware system virtualization for x86.
Prior to that moment, you had software virtualization happening- the host operating system was interpreting any type of operating system on top of that. Very slow, a lot of latency there.
Intel adds a new ability. Let’s say I’m running windows (which I haven’t done for years) and I want to run Linux in a virtual machine. That virtual machine is actually talking directly to my CPU, directly to my RAM, directly to my hard disk - so there’s no latency from the operating system side.
That was where virtual machines come from, and that’s where the cloud comes from. There’s a few problems with the biggest one, being that they’re kind of slow, because you have to load the entire operating system and your application.
Unbeknownst to us, in 2007 or so, C groups get added into the main line of the linux kernel.
This is the first time that containers are being used heavily. Google’s behind this for the most part, and what C groups do as modern day containers is they basically allow you to have the application code and share the operating system resources. That’s really important. We don’t really notice it as a community outside of the bigger technology organizations, but it happens.
Also under the hood now, you have all these applications, all these containers - Who’s going to make sure that they work? Who’s going to make sure that when they die, they get spun up or spun down and all that stuff?
That’s called orchestration and scheduling. Google creates something called Borg that does that internally for them. Fast forward, and they’ve come to success using commodity hardware with amazing software that allows for elasticity of your infrastructure.
In 2013, dotCloud comes around and they give us an API to see groups and containers itself. That’s where Docker comes from.
There’s one more missing piece: in 2015, Google releases Kubernetes on the world, basically an open source version of Borg. They take the best practices they learn from pretty hardcore scheduling orchestration, and bring it in there.
5:03 Benjie: That brings us to today. We’ve got Docker, we’ve got Kubernetes, we’ve got all this really cool open and closed source stuff, but how can we use that technology to make us faster? Frankly, it can be overwhelming.
Going back a second, we’re going to talk about continuous application development. This is an approximation of the agile method. Basically, a product defines a feature. An engineering manager or a PM plan and assign it, developers work on those features, reviews happen, QA happens, end-to-end testing happens (hopefully), then eventually it gets deployed to prod. If we’re being honest, there’s a bit of praying that happens on every prod play.
Then comes the DevOps team, the people that aren’t praying but are the ones trying to make this stuff work. The responsibility of the DevOps team is to actually package, build, and deploy the application that these other team members are working on.
They’re creating and maintaining all this infrastructure. How you turn these things on, how you turn them off, and ultimately, they’re responsible for all the environments that you need to build test and run.
6:16 Here’s the typical long-lived environment workflow that we see. You have your local environment as a developer. I write code, then I push my code and it gets onto development.
Typically, there’s a lot of UAT and user testing there, after which it gets pushed onto staging. If we’re being honest, that’s probably where most of the testing happens. If you’ve got amazing tools in your CI/CD pipeline, eventually it passes and get onto production.
So what happens when a dev pushes something and it breaks one of these environments?
First off, we have to keep in mind that environments that aren’t production are supposed to be fragile. You’re working, you’re iterating, you’re going to break things. The problem is that when you break something, it cascades quickly to the rest of your entire team and you start blocking people.
Your review process grinds to halt. In some instances, production actually goes down. More importantly, you cannot fix production because these other environments are also broken. So, a DevOps person becomes an SRE for every environment.
8:04 Benjie: Ephemeral environments, what are they? They’re short-lived, fully encapsulated environments that give us the ability to understand where the state of an application is at any given time.
You want to break things; you don’t want to block things. With the ephemeral environment paradigm, both can be true.
There’s a reason why we don’t see broken buttons in Facebook or Google products. There are plenty of other issues that happen, but the product itself is bug-free when new features come out. The reason is ephemeral environments.
9:00 Benjie: With ephemeral environments, everybody gets as many environments as they want. QA, Dev, Product, Testing, your CEO, your CTO- whoever it is. The environments are theirs to break- and guess what? If I break something, it doesn’t affect my co-worker, or across the dev team.
You’re able to test production-like environments and find the problems before they get to prod, and before they escalate onto your other environments.
Ephemeral environments also enable asynchronous collaboration. Instead of having to set a meeting up to do a screen-share to look at a work in progress feature, as a developer, I can just give a link. This gives others a way to access a new feature on an ephemeral environment and everybody’s happy.
9:42 Benjie:
You want to have users testing this stuff- the sooner you do those tests, the sooner you’re going to solve all kinds of problems. We’ll talk about that in a later presentation.
You’re going to shift left with ephemeral environments. You can have a setup where, when local development is happening, you push a PR and an environment is created. Users get to test and find bugs earlier on than ever before.
Ephemeral environments also enable the developer to get feedback earlier, instead of having to change your mental model while working on the next feature. You get features in the hands of people earlier, which means they can give you feedback earlier, which ultimately leads to faster deployment times.
10:58 Benjie: At Shipyard, we’ve seen customers reach 50% velocity gains using ephemeral environments when it comes to getting things out the door, along with stability and bug catches.
When you contain your fires, your DevOps people can actually be SREs for the things that they’re supposed to do (production) and developers can fix their own problems on the ephemeral environment side. That’s pretty cool.
This is really about getting as much exposure to in-progress features and things that we think are done as early as possible. That way, key stakeholders can give feedback and figure out if there’s a problem before your end user does.
Now your DevOps person can focus on making sure that production isn’t on fire. That’s a huge benefit.
12:05 Benjie: As for the how, here’s what you need to do. First off, you need to think about your code- what does your application look like? This is solved with GitOps- you push code to your source control, and an action happens. So, I push my code and something gets built, and something gets pushed to a deployment - great!
A lot of people doing that makes it a little complicated. You also have services, databases, caching, messaging queues. You’ve got to make sure that those are encapsulated as well so that when you turn on your ephemeral environment, you have all your code and services.
Further on, you have to worry about state. That means that from environment to environment, you don’t have to go back in and and click on all the buttons. You want to be able to copy your state across multiple environments, making it portable so that people can test.
13:10 Benjie: The next big thing for going ephemeral is third party dependencies and integrations. Say you work with GitHub- how do we simulate getting those pull requests and web hooks? G-suite is another great example. The more you mock, the more you miss- you really want your third-party dependencies to be encapsulated.
Then of course and number one always is security. Who has access to this stuff? You should not have PII in your ephemeral environments. There are exceptions, but that aside- there are secrets and keys you need to interact with these third-party services. You have to gate access to this. These are internal environments, not necessarily things that you want everybody seeing.
14:52 Benjie: The last piece of going ephemeral is making these things on-demand. Elastic compute is really cool, but if you always leave it on it’s not really elastic, it’s just costing more money. The beautiful thing with the set of Kubernetes, Docker, and containerization is that it’s fast, so it only takes a little bit of time to turn it on or off, which equals a huge amount of savings.
That’s a lot of stuff. Isn’t there a platform that could manage this for me? There is, it’s called Shipyard. That’s where I work and we aim to solve as many of those problems and empower you as a user as much as possible.
16:00 Now I’m going to jump into a product demo. I’m logged in to Shipyard and we are going to do a very basic demo.
The first thing I’m going to do is add an application- a node backend / react frontend application.
We use Docker Compose in this particular instance to define our application.
I’m going to create the application.
The public has spoken and we need to do ‘Hot Pink’.
17:58
A lot of this stuff that we’re talking about is integrating with your existing workflow, so I’m going to turn on my GitHub PR comments.
Shipyard has picked up my PR and is going to start building in the background.
While we wait for that, I’m going to show you a little bit of how we do this. At Shipyard, we are extremely opinionated. We have seen a lot of shooting yourself in the foot because of how many options you have. We’ve been doing this for a very long time and we feel like sometimes the biggest problem is actually that there are too many options.
Early on we chose Docker Compose as the application definition. We kind of have a saying, “I say NAY” (Not Another YAML).
19:29 Benjie: Our goal is to fit in the existing workflow, so we’re using Docker Compose as our application definition. The idea is that if you get it to work locally, we take care of the rest for you. I’ll note that we do extend that slightly.
In this particular instance, we have two services- the backend server and the frontend react client, and we want to route to the react client. We say, “hey, this is the route over here,” and that’s it, really simple.
There’s a lot more advanced stuff I would love for you guys to check out on our docs and we’ll have links at the end.
20:53 Benjie: We mentioned security. It’s really important to make sure that certain people do or do not have access. For all the environments we generate, there is always a layer of SSO that sits on top of them. We integrate with your GitHub, your G-Suite, anything that’s SAML.
We don’t have any environment variables for this particular demo, but we do have a very nice secure way to keep things at rest.
One other thing to highlight: when building your application, you often have the problem of, “hey, where did things break on deployment?” We want to give developers, and sometimes product and QA people, the ability and tools to debug. We have a pretty cool build detail page and we have your run logs.
22:02 Benjie: One last thing to highlight is our terminal. We’re turning on pretty extensive environments, so we give you the terminal ability by leveraging some great open source projects.
If you take a look, you actually can see the running pods here for the various services and you can actually exec in and take a look at logs. Again, this is about accessibility for developers and everybody else to figure out what’s going on before you bring in your DevOps person. Often times, DevOps folks still need to help a bit, but we are empowering developers to solve their own problems.
22:50 Benjie: Let’s get back to the whole point. We mentioned that we put SSO in front. Over here we have the ‘Hot Pink.’ It worked, the people have spoken.
The original version is on the right and the ‘Hot Pink’ version is over here. To close the loop, we also have direct links to this on my GitHub.
Fitting into the workflow is one of the most important things for a tool to be able to do. When trying to implement ephemeral environments, if you have to go search things out, it’s not going to make things easier, so you need it to fit within the workflow. We have a Slack integration among other things.
I can get to these things directly, debug them directly, and fit within the existing workflow.
24:49 Benjie: One question that we get a lot, I’m just gonna just jump on top of, is that we do any external registry, we support it - no problem.
Another thing- we have an API. If you’re running Shipyard environments in a CI/CD process and you want to run your tests against an environment that Shipyard has created, we offer that. There’s a way you can use your API key to bypass that so your your CI/CD process doesn’t need to do any weird GitHub O-auth or anything like that.
It’s really important that these environments are actually on-demand- that’s our sleep scheduler.
Q&A
26:23 Ashley: “Where are these environments running? Is it through Shipyard’s own cloud?”
Benjie: We are running on AWS and GCP. We host these environments for you and manage these environments for you.
26:48 Ashley: “I have a multi-container app that i normally deploy to Kubernetes, will I need to maintain both the Compose file and Helm chart for my application?”
Benjie: As of today, Dockerfile is our application definition.
Just to clarify here, I still think docker compose is a great way to encapsulate this stuff, but there are a lot of customers that have larger applications. The other thing that that you bring up with that question Ashley, is you don’t want to be maintaining two things.
If you guys are are really good at Kubernetes and all these other things and you just want to use Shipyard as an environment management platform, we would love to talk .
27:50 Ashley: “You’ve shown well-contained stuff, what if you need things like a new AWS account or RDS and the like?”
Benjie: That’s a great question. That’s where some of these technology stacks come into play. There’s a company I think i mentioned earlier actually called LocalStack. It’s a massive 30,000 star GitHub project and they mock out all the AWS services. What we’ve seen is a lot of people putting that into their infrastructure. These are production-like, these aren’t exactly production, so you can run LocalStack there.
One of our awesome engineers Angel is writing up a blog post on this: we have the capability of having what we call the callback gateway. The idea is that you can have a one-to-many relationship with third-party services.
The idea is that by using Shipyard you can route, let’s say you have a single sandbox from GSuite, you can route those requests appropriately to the environment that had made the response to the environment that made the request.
So it’s kind of a constant vigil but the idea is, that we have to do our best to encapsulate stuff. Now there are other things you can do in regards to creating dynamically resources that are external. If you want to do that type of integration, cool.
29:25 Ashley: There was some clarification in the chat that they do use LocalStack for local, but that it doesn’t translate well to production. I think you kind of covered that with the callback gateway stuff.
“Does Shipyard work for non-http https services? For example is there a way to tunnel to a postgresql container?”
Benjie: Yeah, we have the ability to help with that.
30:01 Ashley: “Is Shipyard compatible with applications that normally run with a service mesh like istio?
Benjie: Again, if you’re a Kubernetes pro we want to talk to you, and yes we would do that. It depends on what the dependencies of the service mesh are, but we bootstrap fully encapsulated Kubernetes clusters, single tenant for each one of our customers - so that should be fine.
30:51 Ashley: “Does Shipyard work with any programming language?”
Benjie: Yes, as long as it’s containerized you can use Shipyard for that. One caveat would be you have to use .NET, the linux .NET. We don’t do windows Docker, we just do linux Docker. So yes .NET core, but that’s the one limitation, so not .NET normal.
31:21 Ashley: “How can Shipyard help me control my cloud costs?”
Benjie: A pattern that we’ve seen pop up a lot is: you turn on an environment, maybe it’s not supposed to be a long-lived environment, maybe this is supposed to be a short-lived environment, but it kind of gets forgotten. We have something called ‘Since Last Visit’ which basically says the interval time since someone visited it.
We also have a sleep schedule that you can use optionally that says, “our developers or QA people or product people are not typically working from 9pm to 9am, we’ll turn that off.” Of course then you can go to the dashboard and turn any of these environments on relatively quickly, within a few minutes depending on the complexity of your infrastructure.
32:27 Ashley: “If i have multiple repositories front-end back-end etc., can I still use Shipyard?”.
Benjie: That’s a great question. That’s a really cool feature that we’re really proud of. We designed Shipyard originally for a mono repo and we kept having people say, “well we’ve got our front-end repository over here, we’ve got our backend repository over here, we’ve got our code deposit or our IAC or infrastructure as code repository over here, how can we combine those?”
We’ve taken the philosophy of making multi repo applications first class, so you can add as many repositories as you want. There’s a model that rebuilds everything, it’s optional, and we do some pretty cool caching stuff as well.
There was a datadog logo at the beginning of the presentation and I think that’s important to highlight it- I think the person that was asking about the service mesh and Kubernetes yamls would probably be interested by this, but basically we also have integrations where you can tie out to Datadog or New Relic or whatever it may be, and you could use the services you’re using in production from the ephemeral environments.
That’s another really important thing to do is you also want to mimic your ability to respond to incidents so you know “hey is that datadog integration still working? I hope so!”
To go back to your question: yes, we support multiple repos, we’re very proud of it. It’s just a click of a button and it aggregates it all together for you.
34:40 Ashley: “Is Shipyard free to use?”
Benjie: We currently offer a 30-day free trial to get started with some limits on the number of DryDocks- think of it as “environments” for right now. Come check it out: shipyard.build is our website.
35:24 Benjie: If you’re serious about trying Shipyard, you can check out our slack or our twitter.
We have a bunch of sample repositories that kind of give you an idea of the power and capability. For the person that said that they are using LocalStack, we actually have an example of that in there.
The other thing that I really want to highlight here, is we are actually contributing a resource called the ephemeralenvironments.io which is a website that goes in depth about all the benefits and all the gotchas. That’s an open source community contributed project. Please: pull requests are welcome, quotes are welcome!