Image Credit for the Googling of images
Private/Public IaaS, and PaaS environments are some of the fastest moving technology domains of their kind right now. I give credit to the speed of change and adoption to the communities that surround them, open-source or within the enterprise. As someone who is in the enterprise but contributes to open-source, it is surprising to many to find out that within the enterprise there is a whole separate community around these technologies that is thriving. That being said, I have been working with os-level virtualization technologies now for the past (almost) 3 years and in the past 2 years most people are familiar with the “Docker-boom” and now with the resurgence or SoA/Microservices I think its worth while exploring what modern technologies are involved, what drivers for change and how it affects applications and your business alike.
Containers and Microservices are compelling technologies and architectures, however, exposing the benefits and understanding where they come from is a harder subject to catch onto. Deciding to create a microservice architecture for your business application or understanding which contexts are bound to which functionality can seems like the complexity isn’t worth it in the long wrong. So here I explore some of the knowledge of microservices that is already out there in understanding when and why to turn to microservices and figuring out why, as in many cases, there really isn’t a need to.
In this post I will hopefully talk about some of major drivers and topics of microservices and how I see them in relationship to data-center technologies and applications. These are my own words and solely my opinion, however I hope this post can help those to understand this space a little better. I will talk briefly about, Conway’s Law, what it means to Break Down the Silos, why it is important to Continuous delivery, the importance of the unix philosophy, how to define a microservice, their relationship to SOA, the complexity involved in the architecture, what changes in the organization must happen, what companies and products are involved int this space, how to write a microservice, the importance of APIs and service discovery, and layers of persistence.
The best definition in my opinion is “A Microservice fits in your head”. There are other definitions involving an amount of pizza, or a specific amount for lines of code, but I don’t like putting these boundaries on what a microservices is. In the simplest, a microservice is something that is small enough to conceptually fit in your head without really having to think to much about it. You can argue about how much someone can fit in their head etc, but then thats just rubbish and un-important to me.
I like to bring up the unix principle here as ice heard from folks at Joyent and other inn he field, this is the design that programs are designed to do one thing and do that one thing well. Like “ls” or “cat” for instance, typically if you design a microservice this way, you can limit its internal failure domain because it does one thing and exposes and API to do so. Now, microservices is a loaded term, and just like SoA there are similarities in these two architectures. But they are just that, architectures and I will add that you can find similarities in many of their parts but the some of the main differences is that SoA used XML, SOAP, typically a Single Message Bus for communication and a shared data source for services. Microservices uses more modern lightweight protocols like RESTful APIs, JSON, HTTP, RPC and typically a single microservice is attached to its own data source, whether is a copy, shard or a its own distributed database. This helps with multi-tenancy, flexibility and context boundaries that help scale such an architecture like microservices. One of the first things people start to realize when deep divining into microservices is the amount of complexity that comes out of slicing up the monolith because inherently you need to orchestrate, monitor, audit, and log many more processes, containers, services etc than you did with a typical monolithic application. The fact that these architectures are much more “elastic and ephemeral” than others forces technical changes that center around the smallest unit of business logic that helps deliver business value when combine with other services to deliver the end goal. This way each smaller unit can have its own change lifecycle, scale independently and be developed free of other dependencies within the typical monolith.
This drives the necessity to adopt a DevOps culture and change organizationally as each service should be developed by independent, smaller teams that can each release code within their own cycles. Teams still need to adhere to the invisible contracts that are between the services, these contracts are the APIs themselves between the services which talk to one another. I could spend an entire post on this topic but there is a great book called “Migrating to Cloud-Native Application Architectures” by Matt Stine of Pivotal (Which is free, download here) that talks about organizational changes, api-based collaboration, microservices and more. There is also a great post by Martin Fowler (here) that talks about microservices and the way Conways law affects the organization.
Importance of APIs
I want to briefly talk about the importance of orchestration, choreography and the important of the APIs that exist within a microservices architecture. A small note on choreography, this is another terms that may be new but its related to orchestration. Choreography is orchestration turned on its head, instead of an orchestration unit signaling when things happen, the intelligence is pushed to the endpoints and those endpoints react to events of changing environments, therefore each service known its own job. A great comparison of this is (here) in the book “Building Microservices” by Sam Newman. Rest APIs are at the heart of this communication, if an event is received from a customer of user, a choreography chain is then initialized and each endpoint talks to each other via these APIs, therefore, these APIs must remain robust, backward compatible and act as contracts between how services interact. A great post of the Netflix microservices work (here) explains this in a little more detail.
If the last few paragraphs and resources make some sense, you end up with a combination of loosely coupled services, strict boundaries, APIs (contracts), robust choreography and vital health and monitoring for all services deployed. These services can be scaled, monitored and moved independently without risk and react well to failures. Some of the exa plea of tools to hel you do this can be found at http://netflix.github.io. This all sounds great, but without taking the approach of “design for the integrations not the infrastructure/platform” (which I’ve heard a lot but can’t quite figure out who the quote belongs to, coudos to who you are :] ) this can fail pretty easily. There is a lot of detail I didnt cover in the above and I suggest looking into the sources I listed for a start on getting into the details of each part. For now I am going to turn to a few topics within microservices, Service Discovery & Registration and Data Persistence layers in the stack.
Service Discovery and Registration
Distributed systems at scale using microservices need a way to registry and discover what services and endpoints are available, enter Service Discovery tools like Consul, Etcd, Zookeepr, Eureka, and Doozerd. (others not listed) These tools make it easy for services to call this layer and find out a way to consume what else is available. Typically this helps one service find out how to connect and use another service. There are three main processes IMO for applications to use this layer:
- when a service gets installed or “comes up” it needs to initially be registered with the discovery layer. An example of this is Registrator (https://github.com/gliderlabs/registrator) which reacts to docker containers starting and sends key/value pairs of data to a tool like Consul or Etcd to keep current discovery data about the service. Such information could be IP Endpoint, Port, API URL/Path, Resources, etc that can be used by the service.
- Discovery is the other end of Registration, when a service wants to use a (for instance) “proxy”, how will it know where the proxy lives or how to access it? Typically in applications this information is in a configuration file or hard coded into the app, whith service discovery all the app needs to do is know how to implement the information owned by the registration mechanism. For instance, an app can start and immediately say “Where is ‘Proxy'” and the discovery mechanism can respond by saying “Here is the Proxy thats closest to you” or “Here is the first Proxy available” along with the IP and Port of that proxy, the app can then just use those values, typically given in JSON or XML and use them inside the application thus not ever hardcoding any configuration anywhere.
- Last but smallest is when the applications received the response back from the discovery mechanism it must know how to process and the the data. E.g. if your asking for a proxy or asking for a database the information given back would be different for the proxy versus how you would actually access the database.
Persistence and Backing Services
Today most applications are stateless applications, which means that they do not own any persistence themselves. You can think of a stateless application as a web-server, this web-server processes requests and talks to a database, but the database is another microsevice and this is where all the state lives. We can scale the web-server as much as we want and even actively load balance those endpoints without ever worrying about any state. However, that database I mentioned in the above example is something we should worry about, because we want our data to be available, and protected at all times. Though, you can’t (today) spin up your entire application stack (e.g. MongoDB, Express.js, Angular.js and Node.js) all in different services (containers) and not worry about how your data is stored, if you do this today you need persistent volumes that can be flexible enough to move with your apps container, which is hard to do today, the data container is just not as flexible as we need it be in todays architectures like Mesos and Cloud Foundry. Today persistence is added via Backing Services (http://12factor.net/backing-services) which are persistence / data layers that exist outside of the normal application lifecycle. This mean that in order to use a database one must first create the backing service then bind it to the application. Cloud Foundry does this today via “cf create-service and cf bind-service APPLICATION SERVICE_INSTANCE” where SERVICE_INSTANCE is the backing store, you can see more about that here. I won’t dig into this anymore other to say that this is problem that needs to be solved, and making your data services as flexible as the rest of your microservices architecture is not easy feat. The below link is a great article by Luke Marsden of ClusterHQ that talks about this very issue. http://www.infoq.com/articles/microservices-revolution
I also wanted to mention an interesting note on persistence in the way Netflix deploys Cassandra. All the data that Netflix uses is deployed on Amazon on EC2 instance and they use ephemeral storage! Which means when the node dies all their data is gone. But alas! they don’t worry about this type of issue anymore because Cassandra’s distributed, self healing architecture allows Netflix to move around their persistence layers and automatically scale them out when needed. I found out they do run incremental backups to S3 by briefly speaking with Adrian Cockcroft at offices hours at the Oreilly Software Architecture conference. I found this to be a pretty interesting point to how Netflix runs its operations for its data layers with Cassandra showing that these cloud-native, flexible, and de-coupled applications are actually working in production and remain reliable and resilient.
Some of the major players in the field today include: (I may have missed some)
- Apache Mesos
- Joyent (Manta)
- IBM Bluemix
- Cloud Foundry
- Amazon Elastic Container Service
- Openshift by RedHat
- OpenStack Magnum
- CoreOS (with Kubernetes) see (Project Tectonic)
- Docker (and Assorted Tools/Binaries)
- Cononical’s LXD
- Giant Swarm
There are many other open source tools at work like Docker Swarm, Consul, Registrator, Powerstrip, Socketplane.io (Now owned by Docker Inc), Docker Compose, Fleet, Weave. Flocker and many more. This is just a token to how this field of technology is booming and were going to see many fast changes in the near future. Its clear that future importance of deploying a service and not caring about the “right layers” or infrastructure will be key. Enabling data flexibility without tight couplings to the service is part of an the overall application design or the data service. These architectures can be powerful for your applications and for your data itself. Ecosystems and communities alike are clearly coming together to help and try to solve problems for these architectures, I’m sure some things are coming so keep posted.
Microservices on your laptop
One way to get some experience with these tools is to run some examples on your laptop, checkout Lattice (https://github.com/cloudfoundry-incubator/lattice/) from Cloud Foundry which allows you to run some microservice-like containerized workloads. This post is more about the high-level thinking, and I hope to have some more technical posts about some of the technologies like Lattice, Swarm, Registrator and others in the future.