A Plain English Introduction to Paxos Protocol

A few weeks ago, I had to have a look at the distributed consensus protocol Paxos. Even though I know its purpose and I’ve built and used distributed systems and databases in the past, Paxos remains mind boggling at first !

The hard way

The best overall description I found is this answer by Vineet Gupta on Quora. After turning my head around it for a while, I finally gained the instinctive understanding which comes when you ‘get’ something.

As a way to both help others to understand Paxos faster and to burn all this in my own memory, I though it would he a good idea to illustrate it as a story (I was inspired by A plain English introduction to CAP Theorem which I found really instructive; I also later discovered that the original Paxos paper itself related the protocol using the metaphor of a parliament).

Once upon a time …

… there were 3 brothers and sisters, Kath, Joe & Tom, living happily. They lived far away, and it was not easy for them to meet and spend some time together. Neither did they have phone or internet, for this was a long time ago. All they had to discuss and share news was good old mail …

Unfortunately, one day, the worst happened : their parents die. All 3 are informed by a letter from the notary, telling them that they need to sell the family house in order to pay for their inherited debts. It also advises to use Paxos to agree on a price (Note : I never said the story was going to be chronologically sound !).

The happy end

As the oldest in the family, Kath decides to take things in hand, and starts the whole thing. She knows Paxos consists of 2 phases : ‘prepare’ and ‘accept’.

Prepare Phase

Kath sends a signed and dated price value proposal to her brothers, by mail.

Joe and Tom both receive the letter from Kath, they think the price is fair. In order to send their agreements back to Kath, they make a copy of the proposition, mark it as agreed, date it, sign it, and send it back.

Accept Phase

Joe lives a bit further away from Kath than Tom does, so correspondence between Kath and Tom is usually faster. Kath indeed receives the agreement from Tom first, she knows she can go on with the protocol straight away, because Paxos relies on majority, and not unanimity. In his letter, Tom agreed to the same price she proposed, so she just picks this one as the final price to agree on.

She sends new letters, called accept letters this time, to her brothers to finalize the agreement. In these letters, she specifies the price that they are agreeing on, plus the date at which it was first suggested (see Prepare Phase). When Tom and Joe receive the accept letter, they simply need to check that the time and price of the proposal to make sure it is what they agreed on, before sending back their final accept letter.

At the time when Kath receives the accept letters from her brothers, everyone knows that the price has been agreed.

Cover of the book "The Fairy Tales of the Grimm Brothers"

After

She then informs the notary on the agreed price. This one sends an information letter to the Kath, Tom & Joe. The house is sold pretty quickly, leaving the family out of financial problems for the rest of their lives …

Shit happens

That story went unexpectedly well ! Let’s see different variations about what would happen in real life.

Joe is particularly slow to answer

Joe has never been good at paperwork … he’s always out partying and having fun, and he does not want to bother answering letters. When Joe receives the prepare letter from Kath, he does not reply straightaway but leaves it on his desk to handle later. Meanwhile, Tom answers as soon as he got the letter. As mentioned before, Paxos relies on majority, as soon as Kath gets Tom’s answer, she can continue to the next phase. In fact, the accept phase also relies on majority, so she can continue to the end of the protocol if Tom continues to answer.

In this case, Joe would receive the accept letter before he sent his answer to the prepare letter, and would know that the consensus is moving on without him. He can try to catch up or not, but the consensus can be reach without him.

Tom wants to speed things up by becoming the master

Tom has always been the hurried brother. He does not like when things linger forever but prefers things to be done quickly. As soon as he receives the letter from the notary, he starts waiting impatiently for the prepare letter from his sister. Kath, on her part, takes a lot of time to settle on a price. Not knowing what is going on, Tom decides to take action, and to takes on the master role : he sends his own copies of the prepare letters. While these letters are in the mail, Kath finally settles on a price, and sends hers.

Joe gets Tom’s proposal first. Thinking that it’s a change in the plan, he responds straight away by signing the proposal and taking a copy for himself. The following day, he receives Kath’s proposal ! He’s a bit surprised, but hopefully, Paxos tells him exactly what to do in this situation. By agreeing to Tom’s proposal, he made a promise to stick to it whatever happens later. Here the date on Kath’s proposal is later than on Tom’s, so Joe is going to answer to Kath that he agrees but to to Tom’s proposal, of which he’ll join a copy.

After receiving the Joe’s agreement on his proposal, Tom has the majority, and should be able to end the protocol.

What about Kath ?

She should have received Tom’s proposal, and rejected it, because she had already proposed a later value. That will not prevent Tom to reach a consensus.

She should have received Joe’s agreement to Tom’s proposal. The same way, she might as well have received Tom’s agreement to his own proposal as an answer to hers. She’d get the majority of agreements, so she might then want to push on. For the accept letter, she must pick a value that has been accepted, in this case, it’s Tom’s proposed value ! Everything ends as expected as she’ll reach the same price as Tom.

Tom wants a higher price an becomes the master

Imagine Tom is obsessed about money ! When he receives Kath’s proposal, he’s outraged ! Believing the house has a lot more value than the proposed price, he sets on to act as a master in Paxos and sends his own proposal letters to his brother and sister.

Unfortunately, when they receive his proposal, they have already agreed to Kath’s older proposal, so they send him back a copy of it as an agreement. Having received agreements to Kath’s value only, he cannot push forward his value. Whether he continues his Paxos or not does not really matter, as he would reach the same value as Kath would.

River flood split between brothers and Kath

There’s a wide river that separates Kath from Joe and Tom. While they were trying to reach consensus, the river flood, cutting all communication between the brothers and their sister. Kath might abort the consensus as she won’t be able to get answers from the majority. On their side, Joe or Tom can takeover the consensus, take on the master role, and still reach a price, as they form a majority. As soon as the river would settle, the messages would arrive to both parties, eventually informing Kath that a price was accepted.

Lots of others

You can imagine zillions of ways in which the consensus between Kath, Joe and Tom could go wrong. For example :

  • Mail is so slow that Kath sends new proposals
  • One letter gets lost and arrives after Kath made a new proposal
  • Kath is struck by a lightning

Go ahead and execute Paxos step by step on all of them, you’ll see that whatever happens, Kath, Joe and Tom will reach a price.

More Formally

Now that you have an instinctive understanding of Paxos, I encourage you to read out the full explanation I found on Quora. Here is a extract with the protocol part :

Protocol Steps:

1) Prepare Phase:

  • A node chooses to become the Leader and selects a sequence number x and value v to create a proposal P1(x, v). It sends this proposal to the acceptors and waits till a majority responds.

  • An Acceptor on receiving the proposal P1(x, v1) does the following:

    • If this is the first proposal to which the Acceptor is going to agree, reply ‘agree’ – this is now a promise that the Acceptor would reject all future proposal requests < x
    • If there are already proposals to which the Acceptor has agreed: compare x to the highest seq number proposal it has already agreed to, say P2(y, v2)
      • If x < y, reply ‘reject’ along with y
      • If x > y, reply ‘agree’ along with P2(y, v2)

2) Accept Phase

  • If a majority of Acceptors fail to reply or reply ‘reject’, the Leader abandons the proposal and may start again.

  • If a majority of Acceptors reply ‘agree’, the Leader will also receive the values of proposals they have already accepted. The Leader picks any of these values (or if no values have been accepted yet, uses its own) and sends a ‘accept request’ message with the proposal number and value.

  • When an Acceptor receives a ‘accept request’ message, it sends an ‘accept’ only if the following two conditions are met, otherwise it sends a ‘reject’:

    • Value is same as any of the previously accepted proposals
    • Seq number is the highest proposal number the Acceptor has agreed to
  • If the Leader does not receive an ‘accept’ message from a majority, abandon the proposal and start again. However if the Leader does receive an ‘accept’ from a majority, the protocol can be considered terminated. As an optimization, the Leader may send ‘commit’ to the other nodes.

And here are the key concepts to map my story to this formal description of Paxos.

Story Paxos
proposal letter (and copy of) P(x,v)
Date (and time) sequence number

At the time of slow mail based communication, using the date and time down to the second is enough to build up unique sequence numbers. In our current time of digital messages, it’s another story, typical Paxos implementation assigns a different and disjoint infinite set of integers for every participant, it does not exactly follow ‘time’, but it’s enough for the algorithm to work.

What Happens to Non-Enthusiast Programmers in the Long Run ?

A few months ago, after receiving good feedback from my regular readers, I posted my latest article Is There Any Room for the Not-Passionate Developer ? on Hackernews and Reddit. I got a huge number of visits, a lot more than I typically get !

I also got a lot more comments, some nice, some tough, some agreeable and some challenging !

First, a summary

In this previous article, I wanted to contrast the different views about work/life balance in the software industry.

Some, like agile gurus and companies like Basecamp, and studies, strongly advocate for sane work hours. They explain that it results in greater productivity and healthy life.

On the other hand, the software field is always bubbling with novelty, and keeping up to date with technologies is by itself a challenge that takes time. For some companies, which might already be fighting for their survival against competition, it is almost impossible to grant some extra training time to their employees. The problem becomes particularly difficult when engineers get older, become parents and cannot afford to spend some extra time learning the latest JavaScript framework.

As a conclusion, I said that for most of us, it’s really difficult to remain a developer in the long run without the grit that only passion for programming brings. I encourage you to read it for more details.

What I learned from the comments

First of all, thanks a lot for all these, they were very valuable, they forced me to think even more about the issue.

People have been burnt !

The word ‘passion’ in particular, triggered engaged comments. As some pointed out, ‘enthusiast’ or ‘professional’ should be favored. It seems that some companies have asked their employees for unquestionable passion for their business (and not for engineering or programming) at the cost of the people’s own lives. As a commenter said, a lot of shops do not integrate the absolute necessity for their programmers to learn continuously in their business model. It made me kind of sad to feel once more this state of our industry.

As a result, people are weary of any statement of ‘passion’ in the workplace, and would prefer to be seen as very skilled professional, dedicated to keeping their skills up to date.

The particular question of France

I received some comments from all over the world, but my observations came from where I work : in France. Here, all in all, we have at least 7 weeks of paid leaves per year. It’s a lot more than in other parts of the world. I think it’s around 2 weeks in the US (other sources point the same fact). Imagine two companies, one from France, and one from the US. The one in the US can invest 5 weeks per year in exploratory learning (which can result in good things for both the business and the employee) while still producing as much as the french one.

Obviously, there are other parameters to take into account for overall productivity like hours per day, the effects of holidays or long hours on creativity, or funding … but here are some facts about software engineering in France :

  • 20% time policy, hackathons and other exploratory learning are extremely rare (I’ve seen it once in 15 years)
  • It’s slowly getting better, but if you remain a programmer in your thirties, you’re seen as a loser
  • France has no software powerhouse like Microsoft, Google, Apple …

This lead me to this open question : What’s the effect of the 7 weeks of paid leaves on the french software industry ?

By no means will I try to give an answer, I just don’t know. Plus, for those who might be wondering : I love my 7 weeks of holidays !

The conclusion I came to

Yet, I can try to draw a conclusion at the individual level. In France, if you’re not really enthusiastic about programming, you won’t put the extra effort off-the-job to learn the latest technologies. Within a few years, you’ll be ‘deprecated’, which will leave you with mainly 2 options :

  • become a manager
  • stick to your current codebase (and become completely dependent of your employer)

To me, the sad truth is that if you want to make a career as a professional developer in France, you’d better be ready to spend some of your free time practicing !

Verify the Big O Complexity of Ruby Code in RSpec

It might be possible to discover performance regressions before running your long and large scale benchmarks !

complexity_assert is an RSpec library that determines and checks the big O complexity of a piece of code. Once you’ve determined the performance critical sections of your system, you can use it to verify that they perform with the complexity you expect.

How does it work ?

The gem itself is the result of an experiment to learn machine learning in 20 hours (you can read more about that experiment in my previous post if you want).

Suppose you have some a method, let’s call it match_products_with_orders(products, orders) which is called in in one of your processes with very large arguments. Badly written, this method could be quadratic (O(n²)), which would lead to catastrophic performances in production. When coding it, you’ve taken particular care to make it perform in linear time. Unfortunately, it could easily slip back to a slower implementation with a bad refactoring … Using complexity_assert, you can make sure that this does not happen :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# An adapter class to fit the code to measure in complexity assert
class ProductsOrdersMatching

    # Generate some arguments of a particular size
    def generate_args(size)
        # Let's assume we have 10 times less products than orders
        [ Array.new(size / 10) { build_a_product() }, Array.new(size) { build_an_order() } ]
    end

    # Run the code on which we want to assert performance
    def run(products, orders)
        match_products_with_orders(products, orders)
    end
end

describe "Products and Orders Matching" do

    it "performs linearly" do
        # Verify that the code runs in time proportional to the size of its arguments
        expect(ProductOrdersMatching.new).to be_linear()
    end

end

That’s it ! If ever someone changes the code of match_products_with_orders and makes it perform worse than linearly, the assertion will fail ! There are similar assertions to check for constant and quadratic execution times.

Internally, the code will be called a number of times with different (smallish) sizes of arguments and the execution times will be logged. When this is over, by doing different flavors of linear regressions, it should determine whether the algorithm performs in O(1), O(n) or O(n²). Depending on your code, this can take time to run, but should still be faster than running large scale benchmarks.

Just check the README for more details.

Did you say experiment ?

It all started like an experiment. So the gem itself, is still experimental ! It’s all fresh, and it could receive a lot of enhancements like :

  • Allow the assertion to specify the sizes
  • Allow the assertion to specify the warm-up and run rounds
  • Robustness against garbage collection : use GC intensive ruby methods, and see how the regression behaves
  • Find ways to make the whole thing faster
  • O(lnx) : pre-treat with exp()
  • O(?lnx) : use exp, then a search for the coefficient (aka polynomial)
  • O(xlnx) : there is no well known inverse for that, we can compute it numerically though
  • Estimate how much the assert is deterministic

As you see, there’s a lot of room for ideas and improvements.

How I Got My Feet Wet With Machine Learning With ‘the First 20 Hours’

I’m currently wrapping up an alpha of a unit testing ruby gem that allows to assert the complexity of a piece of code. It’s the result of an experiment to learn some Machine Learning skills in 20 hours … not bad for a first a try at Data Science ! This is the story of this experiment.

How it all started ?

A few months ago, I read The First 20 Hours. The book describes a technique to get up to speed and learn some practical skills on any particular subject in only 20 hours. As examples, the author details how he managed to teach himself a pretty decent level of Yoga, Ukulele, Wind Surfing, Programming, Go and touch typing.

I decided to give it a try. In order to get a boost, I found a few motivated people at work to do it with me. I started by presenting them the technique described in the book, and asked everyone what they wanted to learn. After a quick vote, we set out to learn more about Machine Learning.

The technique

The method is meant to allow anyone to learn necessary skills to accomplish a specific task in about 20 hours. I my case, I could expect to get a basic understanding of the Machine Learning concepts, as well as some practical skills to do something involving Machine Learning. Here are the details of the technique :

  1. H0 : Deep dive in the main concepts and theory of machine learning
  2. H6 : Define an ambitious and practical goal or skill level to achieve by the end, and an outline of how to get there
  3. H6 to H20 : Learn by doing

As you see, the technique is pretty simple !

How did it work ?

For the group

The plan for the group was :

  • to meet weekly for 2 hours
  • to share what we learned at the end of every session
  • to bound by similar goals

At first, people were enthusiastic about learning something like machine learning. After a while, I started to get the following remarks :

  • “I don’t really see the point of doing this together rather than independently”
  • “I’m feeling a bit lost by not having a concrete goal and a plan from H0”
  • “I picked up a target that’s too large for me”

The learning curve must have proven too steep, because as time went by, a lot of people droped out, and we ended up being only 2 !

For me

The first phase was the toughest. As the author had warned in his book, “You’ll get deep above your head in theory and concepts you don’t know”, “You’ll feel lost”. He had some reassuring words though : “The steeper the learning curve, the more you’ll be learning !” I actually like this feeling of unknown things to learn, and that’s why I stuck to it.

I took me 8 hours, and not 6 to get a good overall grasp of Machine Learning techniques. The theory was just too wide and interesting and I could not cut the learning after just 6 hours :–). I studied Machine Learning for developers plus a few other pages for details on specific points. I took and kept notes about what I learned. I chose my subject “unit testing algorithm complexity” for the following reasons :

  • I could imagine some utility
  • I had been writing benchmarks at work for 3 years, and I knew the practice well enough
  • It’s pretty easy to generate data for this subject : just run your algorithm !
  • It seems a good first step, doable with basic Machine Learning techniques like linear regression
  • It seems small enough to get something working in 12 hours
  • I could use ruby, which I find both fast and pleasant to program

This is the plan I set out :

  1. Generate data with a linear algorithm (O(n))
  2. Run linear regression on the data
  3. Compute the the RMSE of the model
  4. Deal with Garbage Collection in order to make reduce its noise
  5. Deal with interpreter warm-up for the same reason
  6. Generate data for a constant (O(1)) algorithm and build a model for it
  7. Find a way to identify if an algorithm is constant or linear from it’s execution timings
  8. Generate data for a quadratic (O(2)) algorithm and build a model for it
  9. Identify if an algorithm is constant, linear or quadratic
  10. Package all this in an RSpec library

It started well, and I made good progress. Unfortunately, as people dropped out of the group and I got more urgent things to do at work, I had to pause my project for a while. It’s only been since last week that I got some time during my holidays to finish this off. I’m currently at H18, and I’ve completed all steps from 1 to 9.

As I said the project is still in early alpha. They is a lot of points in which it could be improved (more complexities, faster, more reliable …). Even though I did not tackle the more advanced machine learning techniques, I now understand the overall process of ML : explore to get an intuitive grasp of the data, try out a model, see what happens, and repeat … I feel that learning these more advanced techniques would be easier now.

My opinion on the method

Overall, I found the experiment really effective, it’s possible to learn quite a lot by focusing on it for 20 hours. A word of warning though : you need to be really motivated and ready to stick through difficulties.

It’s also been very pleasant. I’ve always loved to learn new things, so I might be a little biased on that aspect. I liked the first part when I felt that there was a lot to learn in a large subject I knew almost nothing about. I loved the second part too, although this might be more related to machine learning, because I felt like exploring an unknown (data set) and trying to understand it.

I’ve never been afraid to learn something, doing this experiment taught me I can learn anything fast ! I’ll definitely re-use it again.

One last word about doing this in group. My own experiment did not work very well. Most people were not comfortable with the first ‘explore’ phase. I guess one could make it work better by starting 6 or 8 hours before the rest of the group, enough to grasp the basic concepts and come up with a few end goals. Having concrete targets from day 1 should help people to stick through and to collaborate. The ‘guide’ could also help the others through the first phase.

Stay tuned, I’ll present my gem in a following post

Overclocking a Scrum Team to 12

From Wikipedia :

Overclocking is configuration of computer hardware components to operate faster than certified by the original manufacturer …

It is said that Scrum teams work best at 7 people, and that they break at about 10. The trouble is that sometimes there is just too much work for 7 people, but no enough for a full Scrum of Scrums. What if there was a simple way to hack this number up to 12 ?

An Idea

The Surgical Team

In his classic The Mythical Man Month Fred Brooks presents an idea to organize software development the way surgeons work. The master performs the surgery while the rest of his team (intern or junior surgeon and the nurses) are there to support him. Fred Brook imagined an organization where master developers could be the only ones with access to the production code, while other more junior developers would have the task to provide them with tools and technical libraries.

I admit that this idea sounds out-of-fashion in contrast with modern agile teams of generalists … Still …

Tools

At work, we are working on a pretty technical and complex product which requires some time getting into both the code and the domain. We took a few interns during the past years, and a bit like Fred Brooks, we came to the conclusion that internships yield more results when focused on building supporting tools rather than joining the team and working on production code.

We’ve also been doing retrospectives for 3 years now, we’ve stolen a lot of best practices from the industry and the team is working a lot better than it used to. The pending side of this is that nowadays, the opportunities for improvement that we discover are a lot more specific, and they often need us to take some time to build new tools to support our work.

The Agile Surgical Team

Agile method such as Scrum or XP are all about creating real teams instead of a collection of individual. That means that if we wanted to adopt the surgical team idea, we could use teams instead of individuals : a team of experts, and a tooling team of apprentice developers !

Why not, there’s not nothing really new here, but the challenge is to run such a tooling team efficiently !

  • 3 people or less : there’s evidence in the industry that micro teams can self organize in an ad-hoc manner
  • Mandate ScrumBan, Continuous Delivery and Devops : on site customer makes this possible, it should reduce project management overhead to almost nothing, and enforce quality
  • A sandbox for junior developers : there’s no risk of messing up production code here, the domain (tools for software developers) is straightforward and the fast feedback provides a great environment for learning

Obviously, for this to work, you’ll also need to have enough tooling work to do for a 3 people team. That’s usually the case, the CI alone can take quite some time (see Jez Humble’s talk Why Agile Doesn’t Work) and any team will have its own custom tools to do. For example, in our team, we built our own benchmark framework and we could benefit a lot from Saros on IntelliJ.

Not quite there yet

I promised to scale up to 12. Let’s do the maths :

  • 3 people in the tooling team
  • 8 people in the product team if we push Scrum a bit

That’s only 11, 1 is missing. This one is more specific to each team’s context.

As I said earlier, the product we are building is pretty technical and complex. Sometimes, we simply don’t know how we are going to do something. We need to try different ways before finding out the good one. The typical agile way of doing that is by conducting time-boxed spikes. Spikes are fine for code and design related issues but way too short to deal with hard R&D problems. These need a lot of uninterrupted time for research and experiments, so it’s not really possible to split them in backlog stories that any one can work on either …

The R&D Role

Here is what you want : some uninterrupted time to learn and experiment different ways to do something difficult.

Here is what you don’t want :

  • specialists in the team
  • people out of sync with the daily production constraints
  • a never ending ‘research’ topic

Here is a simple solution in the context I describe : add someone in the product team, and do some 2 month round robin on hard subjects. This should leave plenty of time to study something different, but not so much time that one looses connection with the rest of the team. Plus it brings a bit of diversity in every one’s daily work. One issue I can think of is that working on isolation might leave someone on a bad track, regularly presenting what was done to the rest of the team might alleviate this concern.

A final word

Obviously, this has a smell of specialization, we’re bending Scrum principles a bit. So take it for what it is : just like overclocking, it’s a hack to get a bit of extra juice before investing in something a lot more expensive (Scrum of Scrums, Less or whatever).

How to Kill Scrum Zombies ?

First of all, what is that ? Usually, Scrum zombies go in groups, and quite often, you’ll find a full team of them :

A typical team of Scrum Zombies follows Scrum pretty well, does all the ceremonies, adopted good engineering practices, and might even be delivering OK. But all the fun is gone, everyone is on autopilot, no continuous improvement is happening anymore, retrospectives are dull and repetitive … There’s a gut feeling in the air that sooner or later, the project will miss a turn.

Sounds familiar ?

What’s going on exactly ?

When dev teams want to get more (agile|lean|reactive|.*) they often resort to hiring a full time coach. At first, a coach can have a great impact on the team. He will unblock change, show different ways of getting things done and train the team to new practices. Once all this is done, the coach becomes like any team member, or sometimes just leaves. That’s the point when the team, as a whole, has to take on responsibility for continuous improvement.

What’s needed then ?

The team needs to be able to conduct their own experiments and improvements. For this, they need divergent thinking, or creativity, or thinking out of the box; name it as you prefer. In a complex world, no single individual can bring all the answers to all the upcoming issues any team will face. Once the coach has put in place the practices necessary to continuous improvement, it’s up to the team.

Unlike what the common idea says, creativity does not come out of thin air, it is cultivated !

Diversity in the team

Diversity does not mean minority quotas in your team. Diversity means diversity of interest, of way of thinking, of mentality, of way of working … The more diverse your team members, the more likely they’ll find innovative ways to work out of their current problems.

Slack

Removing any slack from the planning is the surest way to kill creativity and innovation. Great ideas often come at unexpected moments (see Pragmatic Thinking and Learning: Refactor Your Wetware) because the mind works in the background to find them. You want to leave some time for that.

Go to conferences

Creativity builds on creativity. Great ideas are often adaptations of one or many existing ones. Going to conferences is a great way to collect a lot of ideas !

Share trainings and lectures

Different people might react differently to the same information. When a team member finishes reading a book or comes back from a training, it’s a great idea to have him present what he learned to the others. This will reinforce his own learning, but it might also trigger new ideas in his team mates.

A dash of turnover

Too much turnover can be fatal to a team, but not having any will bring other kind of problems too. Newcomers will challenge the status-quo, and the “this is how it’s done here” motto won’t be enough for them. That’s just what’s needed to trigger a sane re-examination of the current practices. Oh … and turnover between teams is fine too ! If your company is large enough, you don’t need to hire or fire people to create turnover, just encourage them to move to other teams !

The tricky part of complexity

By leaving time for other things than just cranking out stories, life will come back into the project, and zombies should go away. But wait, there’s even more !

Software projects are pretty complex beasts. One of the most counter-intuitive thing with these complex system is that they make planning very difficult. Focusing too much on your main goal might be slowing you down !

In the face of complexity your project landscape is like a maze of tunnels ! Who said you’re choosing the best ones ? By keeping free time to explore other, seemingly unrelated, topics you might discover opportunities that will remove a lot of the work to get to your final destination !

Continuously Deliver a Rails App to Your DigitalOcean Box Using Docker

I decided to use my latest side project as an occasion to learn Docker. I first used Heroku as a platform for deployment (see previous post). It works fine but I discovered the following shortcomings :

  • Heroku does not deploy with Docker, which means that I’d get quite different configurations between dev and prod, which is one of the promises of Docker :(
  • The dockerfile provided by docker runs bundle install in a directory outside of the docker main shared volume, this forces to do bundle update twice (once to update Gemfile.lock and a second time to update the actual gems …)

None of these issues could be fixed without moving away from Heroku.

A great Tutorial / Guide

I followed Chris Stump’s great tutorials to setup Docker for my app, to continuously integrate on CircleCI and to continuously deploy on a private virtual server on DigitalOcean.

The first 2 steps (Docker & CI) worked really out of the box after following the tutorial. Dealing with step 3 (CD) was a bit more complicated, because of :

  1. the specificities of DigitalOcean
  2. the fact that I’m a no deployment expert …

What did I need to do to make it work

Setup SSH on the DigitalOcean box

I started by creating a one-click DigitalOcean box with Docker pre-installed. That’s the moment where I had to setup SSH in order to make CircleCI deploy to my box. DigitalOcean has a guide for this, but here is how id did it :

  1. Create a special user on my dev machine adduser digitaloceanssh
  2. Log as this user su digitaloceanssh, and generated ssh keys for it ssh-keygen
  3. Print the public key cat ~/.ssh/id_rsa.pub and copy paste it in your DigitalOcean box setup
  4. Print the private key cat ~/.ssh/id_rsa and copy past it in your circle-ci job ssh keys

The benefit of this is that you should now be able to ssh in your DigitalOcean box from your digitaloceanssh user ssh root@<ip.to.digital.ocean>

Optional : update the box

The first time I logged into my box, I noted that packages were out of date. If you need it, updating the packages is a simple matter of apt-get update && apt-get upgrade

Fix deployment directory

By default, the home dir of the root user on the DigitalOcean box is /root/. Unfortunately, Chris Stump’s tutorial assumes it to be /home/root/. In order to fix that, I ssh-ed in the box and created a symbolic link : ln -s /root /home/root.

Install docker-compose on the box

Chris Stump’s tutorial expects docker-compose on the deployment box, but DigitalOcean only installs Docker on its boxes … Install instructions for docker-compose can be found here. Don’t use the container option, it does not inherit environment variables, and will fail the deployment, just use the first curl based alternative.

Warning : replace ALL dockerexample

This comes as an evidence, but be sure to replace all the references to ‘dockerexample’ to your own app name in all of Chris Stump’s templates (I forgot some and lost a few rebuilds for that)

Create the production DB

Chris Stump’s deployment script works with an existing production DB. The first migration will fail. To fix this, just do the following :

  1. ssh into the DigitalOcean server
  2. run DEPLOY_TAG=<latest_deploy_tag> RAILS_ENV=production docker-compose -f docker-compose.production.yml run app bundle exec rake db:create

You can find the latest DEPLOY_TAG from the CircleCi step bundle exec rake docker:deploy

How to access the logs

It might come handy to check the logs of your production server ! Here is how to do this :

  1. ssh in your production server
  2. run the following to tail on the logs DEPLOY_TAG=`cat deploy.tag` RAILS_ENV=production docker-compose -f docker-compose.production.yml run app tail -f log/production.log

Obviously, tail is just an example, use anything else at your convenience.

Generate a secret token

Eventually, the build and deployment job succeeded … I had still one last error when I tried to access the web site : An unhandled lowlevel error occurred. The application logs may have details.. After some googling, I understood that this error occurs when you did not set a secret key base for your rails app (details). There is a rails task to generate a token, all that was needed was to create a .env file on the server with the following :

SECRET_KEY_BASE=<GENERATED_SECRET...>

What’s next ?

Obviously, I learned quite a lot with this Docker exploration. I am still in the discovery phase, but my planning poker side project is now continuously built on circleci, and deployed to a DigitalOcean box.

The next steps (first, find a better subdomain, second, speed up the build job) will tell me if this kind of deployment is what I need for my pet projects. If it turns out too complicated or too difficult to maintain, Dokku is on my radar.

ReXP : Remote eXtreme Programming

My colleague Ahmad from Beirut gave a talk at Agile Tour Beirut on Saturday about how we adopted XP to a distributed team at work. I gave him a hand and played the remote guy during the talk.

With me through Skype, we did a first demo of remote pair programming on FizzBuzz using IDEA and Floobits

We then did a demo of remote retrospectives using Trello

When should I use ReXP

The conclusion is that :

  • If people are spread over 2 or a few cities, and that they are enough at every place to build a teams, just build different teams at every location
  • If people are spread over a lot of places, maybe involving many time zones, then the open source, pull request based work-flow seems the best
  • Otherwise, if there are not enough people to build 2 teams, that they are spread over only a few locations, that the time difference is not too big, then stretching XP to Remote will work best

As it is said that “nothing beats XP for small collocated teams”, I guess “nothing beats ReXP for small almost collocated teams”.

Tools to make it better

As Ahmad said in his talk, tools already exist. We could add that more would be welcome :

  • Floobits or Saros help tremendously for remote pairing, but maybe cloud based IDEs like Eclipse Che or Cloud 9 will make all these tools useless !
  • Trello works well for remote retrospectives, but some great activities like the 5 whys are still difficult to do with Trello. I’m sure new tools could do better.
  • I’m currently building a remote planning poker app
  • My other colleague Morgan wants to build a virtual stand up token to make it flow

Finally, here are the slides :

EDIT 2016/11/23 : the full video is now on YouTube

3 More Great Talks From JavaOne 2016

After the top 5 talks I attended at JavaOne here are more !

Managing Open Source Contributions in Large Organizations

James Ward

This talk was very interesting for companies or organizations that want to use Open Source in some way without ignoring the risks.

After an introduction listing the benefits of contributing to open source, James explained the associated risks :

  • Security (evil contributions or information leaks)
  • Quality (bad contributions, increased maintenance or showing a bad image)
  • Legal (responsibility in case of patent infringing contribution, ownership of a contribution, licenses)

He then explained that there are 3 ways to deal with the issue :

Strategy Description Pros Cons Popularity Examples
Do nothing Devs just contribute without saying it Easy, Gets it done Need to stay under the radar, Risks for all parties are ignores +++++ Most open source code on Github is shared in this manner |
Join a foundation Joining an existing open source foundation, with a framework Everything out of the box (infra, governance), builds trust Rules can be heavy, Ownership is given to the foundation +++ Linkedin put Kafka in the Apache Foundation |
Build tools Use your own tools to mitigate the main risks associated with the ‘Do nothing’ strategy Built on top of Github, Keep control, Keeps things easy Need to develop, test and operate the tools + Demo of a tool plugged into Github to enforce a contributor license agreement for anyone pushing a pull request |

The ‘build tools’ strategy looks promising, even if it is not yet widely used !

Here are the talk and the slides on the authors website.

Java Performance Analysis in Linux with Flame Graphs

Brendan Gregg

This is what a flame graph looks like :

Technically, it’s just an SVG with some Javascript. It shows the performance big picture. It aggregates data from Linux and JVM profilers. Vertically, you can see the call stacks in your system. The larger a block, the more time is taken inside a function (or in a sub call). The top border is where the CPU time is actually taken. If you want to speed up your system, speed up the wider zones at the top of the graph.

At Netflix, the speaker is a performance engineer, and his job is to build tools to help other teams discover performance issues. This is how they use Flame Graphs :

  • Compare 2 flame graphs at different times to see what changed
  • Do a canary release and compare the new flame graph before finishing the deployment
  • Taking continuous flame graphs on running services helps identify JVM behavior like JIT or GC
  • They use different color themes to highlight different things
  • They also use them to identify CPU cache misses

By the way, I also thought this was a great example of using an innovative visualization to manage tons of data.

I could find neither the video nor the slides of the talk, but I managed to find a lot of others talks about Flame Graphs, as well as extra material on the speaker’s homepage.

Increasing Code Quality with Gamification

Alexander Chatzizacharias

You might be wondering why we should care about gamification ?

  • Worldwide 11.2 billion hours are spent playing every week !
  • People love to play because it makes them feel awesome
  • Games are good teachers
  • At work we are the ones who need to make others successful
  • But only 32% of workers are engaged in their work !

Games rely on 4 main dynamics :

  • Competition (be very careful of closed economics which can be very bad for teams)
  • Peer pressure (Public stats push teams and individual to conform to the norm)
  • Progression (regular recognition of new skills is motivating)
  • Rewards (Badges, Level ups, Monkey Money, real money …)

He went on to demonstrate two games that are based on Jenkins and Sonar that aim at better code quality :

  • One mobile app developed during a 24h Hackathon at CGI which might be open sourced at some point
  • Another one called ‘Dev Cube’ created at an university, where you get to decorate you virtual cubicle

At the end of the talk, he gave the following recommendations :

  • Understand the needs of all to respond to everyone’s personal goals
  • Don’t assign things to do, that’s not fun, give rewards instead
  • Keep managers out of the picture
  • To keep it going, you need regular improvements, special events and new rules
  • KISS !

Playing at work might not be unproductive in the end !

The same talk given at NLJug, unfortunately, it’s in Dutch. English slides are here though.