How I Got My Feet Wet With Machine Learning With ‘the First 20 Hours’

I’m currently wrapping up an alpha of a unit testing ruby gem that allows to assert the complexity of a piece of code. It’s the result of an experiment to learn some Machine Learning skills in 20 hours … not bad for a first a try at Data Science ! This is the story of this experiment.

How it all started ?

A few months ago, I read The First 20 Hours. The book describes a technique to get up to speed and learn some practical skills on any particular subject in only 20 hours. As examples, the author details how he managed to teach himself a pretty decent level of Yoga, Ukulele, Wind Surfing, Programming, Go and touch typing.

I decided to give it a try. In order to get a boost, I found a few motivated people at work to do it with me. I started by presenting them the technique described in the book, and asked everyone what they wanted to learn. After a quick vote, we set out to learn more about Machine Learning.

The technique

The method is meant to allow anyone to learn necessary skills to accomplish a specific task in about 20 hours. I my case, I could expect to get a basic understanding of the Machine Learning concepts, as well as some practical skills to do something involving Machine Learning. Here are the details of the technique :

  1. H0 : Deep dive in the main concepts and theory of machine learning
  2. H6 : Define an ambitious and practical goal or skill level to achieve by the end, and an outline of how to get there
  3. H6 to H20 : Learn by doing

As you see, the technique is pretty simple !

How did it work ?

For the group

The plan for the group was :

  • to meet weekly for 2 hours
  • to share what we learned at the end of every session
  • to bound by similar goals

At first, people were enthusiastic about learning something like machine learning. After a while, I started to get the following remarks :

  • “I don’t really see the point of doing this together rather than independently”
  • “I’m feeling a bit lost by not having a concrete goal and a plan from H0”
  • “I picked up a target that’s too large for me”

The learning curve must have proven too steep, because as time went by, a lot of people droped out, and we ended up being only 2 !

For me

The first phase was the toughest. As the author had warned in his book, “You’ll get deep above your head in theory and concepts you don’t know”, “You’ll feel lost”. He had some reassuring words though : “The steeper the learning curve, the more you’ll be learning !” I actually like this feeling of unknown things to learn, and that’s why I stuck to it.

I took me 8 hours, and not 6 to get a good overall grasp of Machine Learning techniques. The theory was just too wide and interesting and I could not cut the learning after just 6 hours :–). I studied Machine Learning for developers plus a few other pages for details on specific points. I took and kept notes about what I learned. I chose my subject “unit testing algorithm complexity” for the following reasons :

  • I could imagine some utility
  • I had been writing benchmarks at work for 3 years, and I knew the practice well enough
  • It’s pretty easy to generate data for this subject : just run your algorithm !
  • It seems a good first step, doable with basic Machine Learning techniques like linear regression
  • It seems small enough to get something working in 12 hours
  • I could use ruby, which I find both fast and pleasant to program

This is the plan I set out :

  1. Generate data with a linear algorithm (O(n))
  2. Run linear regression on the data
  3. Compute the the RMSE of the model
  4. Deal with Garbage Collection in order to make reduce its noise
  5. Deal with interpreter warm-up for the same reason
  6. Generate data for a constant (O(1)) algorithm and build a model for it
  7. Find a way to identify if an algorithm is constant or linear from it’s execution timings
  8. Generate data for a quadratic (O(2)) algorithm and build a model for it
  9. Identify if an algorithm is constant, linear or quadratic
  10. Package all this in an RSpec library

It started well, and I made good progress. Unfortunately, as people dropped out of the group and I got more urgent things to do at work, I had to pause my project for a while. It’s only been since last week that I got some time during my holidays to finish this off. I’m currently at H18, and I’ve completed all steps from 1 to 9.

As I said the project is still in early alpha. They is a lot of points in which it could be improved (more complexities, faster, more reliable …). Even though I did not tackle the more advanced machine learning techniques, I now understand the overall process of ML : explore to get an intuitive grasp of the data, try out a model, see what happens, and repeat … I feel that learning these more advanced techniques would be easier now.

My opinion on the method

Overall, I found the experiment really effective, it’s possible to learn quite a lot by focusing on it for 20 hours. A word of warning though : you need to be really motivated and ready to stick through difficulties.

It’s also been very pleasant. I’ve always loved to learn new things, so I might be a little biased on that aspect. I liked the first part when I felt that there was a lot to learn in a large subject I knew almost nothing about. I loved the second part too, although this might be more related to machine learning, because I felt like exploring an unknown (data set) and trying to understand it.

I’ve never been afraid to learn something, doing this experiment taught me I can learn anything fast ! I’ll definitely re-use it again.

One last word about doing this in group. My own experiment did not work very well. Most people were not comfortable with the first ‘explore’ phase. I guess one could make it work better by starting 6 or 8 hours before the rest of the group, enough to grasp the basic concepts and come up with a few end goals. Having concrete targets from day 1 should help people to stick through and to collaborate. The ‘guide’ could also help the others through the first phase.

Stay tuned, I’ll present my gem in a following post

Overclocking a Scrum Team to 12

From Wikipedia :

Overclocking is configuration of computer hardware components to operate faster than certified by the original manufacturer …

It is said that Scrum teams work best at 7 people, and that they break at about 10. The trouble is that sometimes there is just too much work for 7 people, but no enough for a full Scrum of Scrums. What if there was a simple way to hack this number up to 12 ?

An Idea

The Surgical Team

In his classic The Mythical Man Month Fred Brooks presents an idea to organize software development the way surgeons work. The master performs the surgery while the rest of his team (intern or junior surgeon and the nurses) are there to support him. Fred Brook imagined an organization where master developers could be the only ones with access to the production code, while other more junior developers would have the task to provide them with tools and technical libraries.

I admit that this idea sounds out-of-fashion in contrast with modern agile teams of generalists … Still …

Tools

At work, we are working on a pretty technical and complex product which requires some time getting into both the code and the domain. We took a few interns during the past years, and a bit like Fred Brooks, we came to the conclusion that internships yield more results when focused on building supporting tools rather than joining the team and working on production code.

We’ve also been doing retrospectives for 3 years now, we’ve stolen a lot of best practices from the industry and the team is working a lot better than it used to. The pending side of this is that nowadays, the opportunities for improvement that we discover are a lot more specific, and they often need us to take some time to build new tools to support our work.

The Agile Surgical Team

Agile method such as Scrum or XP are all about creating real teams instead of a collection of individual. That means that if we wanted to adopt the surgical team idea, we could use teams instead of individuals : a team of experts, and a tooling team of apprentice developers !

Why not, there’s not nothing really new here, but the challenge is to run such a tooling team efficiently !

  • 3 people or less : there’s evidence in the industry that micro teams can self organize in an ad-hoc manner
  • Mandate ScrumBan, Continuous Delivery and Devops : on site customer makes this possible, it should reduce project management overhead to almost nothing, and enforce quality
  • A sandbox for junior developers : there’s no risk of messing up production code here, the domain (tools for software developers) is straightforward and the fast feedback provides a great environment for learning

Obviously, for this to work, you’ll also need to have enough tooling work to do for a 3 people team. That’s usually the case, the CI alone can take quite some time (see Jez Humble’s talk Why Agile Doesn’t Work) and any team will have its own custom tools to do. For example, in our team, we built our own benchmark framework and we could benefit a lot from Saros on IntelliJ.

Not quite there yet

I promised to scale up to 12. Let’s do the maths :

  • 3 people in the tooling team
  • 8 people in the product team if we push Scrum a bit

That’s only 11, 1 is missing. This one is more specific to each team’s context.

As I said earlier, the product we are building is pretty technical and complex. Sometimes, we simply don’t know how we are going to do something. We need to try different ways before finding out the good one. The typical agile way of doing that is by conducting time-boxed spikes. Spikes are fine for code and design related issues but way too short to deal with hard R&D problems. These need a lot of uninterrupted time for research and experiments, so it’s not really possible to split them in backlog stories that any one can work on either …

The R&D Role

Here is what you want : some uninterrupted time to learn and experiment different ways to do something difficult.

Here is what you don’t want :

  • specialists in the team
  • people out of sync with the daily production constraints
  • a never ending ‘research’ topic

Here is a simple solution in the context I describe : add someone in the product team, and do some 2 month round robin on hard subjects. This should leave plenty of time to study something different, but not so much time that one looses connection with the rest of the team. Plus it brings a bit of diversity in every one’s daily work. One issue I can think of is that working on isolation might leave someone on a bad track, regularly presenting what was done to the rest of the team might alleviate this concern.

A final word

Obviously, this has a smell of specialization, we’re bending Scrum principles a bit. So take it for what it is : just like overclocking, it’s a hack to get a bit of extra juice before investing in something a lot more expensive (Scrum of Scrums, Less or whatever).

How to Kill Scrum Zombies ?

First of all, what is that ? Usually, Scrum zombies go in groups, and quite often, you’ll find a full team of them :

A typical team of Scrum Zombies follows Scrum pretty well, does all the ceremonies, adopted good engineering practices, and might even be delivering OK. But all the fun is gone, everyone is on autopilot, no continuous improvement is happening anymore, retrospectives are dull and repetitive … There’s a gut feeling in the air that sooner or later, the project will miss a turn.

Sounds familiar ?

What’s going on exactly ?

When dev teams want to get more (agile|lean|reactive|.*) they often resort to hiring a full time coach. At first, a coach can have a great impact on the team. He will unblock change, show different ways of getting things done and train the team to new practices. Once all this is done, the coach becomes like any team member, or sometimes just leaves. That’s the point when the team, as a whole, has to take on responsibility for continuous improvement.

What’s needed then ?

The team needs to be able to conduct their own experiments and improvements. For this, they need divergent thinking, or creativity, or thinking out of the box; name it as you prefer. In a complex world, no single individual can bring all the answers to all the upcoming issues any team will face. Once the coach has put in place the practices necessary to continuous improvement, it’s up to the team.

Unlike what the common idea says, creativity does not come out of thin air, it is cultivated !

Diversity in the team

Diversity does not mean minority quotas in your team. Diversity means diversity of interest, of way of thinking, of mentality, of way of working … The more diverse your team members, the more likely they’ll find innovative ways to work out of their current problems.

Slack

Removing any slack from the planning is the surest way to kill creativity and innovation. Great ideas often come at unexpected moments (see Pragmatic Thinking and Learning: Refactor Your Wetware) because the mind works in the background to find them. You want to leave some time for that.

Go to conferences

Creativity builds on creativity. Great ideas are often adaptations of one or many existing ones. Going to conferences is a great way to collect a lot of ideas !

Share trainings and lectures

Different people might react differently to the same information. When a team member finishes reading a book or comes back from a training, it’s a great idea to have him present what he learned to the others. This will reinforce his own learning, but it might also trigger new ideas in his team mates.

A dash of turnover

Too much turnover can be fatal to a team, but not having any will bring other kind of problems too. Newcomers will challenge the status-quo, and the “this is how it’s done here” motto won’t be enough for them. That’s just what’s needed to trigger a sane re-examination of the current practices. Oh … and turnover between teams is fine too ! If your company is large enough, you don’t need to hire or fire people to create turnover, just encourage them to move to other teams !

The tricky part of complexity

By leaving time for other things than just cranking out stories, life will come back into the project, and zombies should go away. But wait, there’s even more !

Software projects are pretty complex beasts. One of the most counter-intuitive thing with these complex system is that they make planning very difficult. Focusing too much on your main goal might be slowing you down !

In the face of complexity your project landscape is like a maze of tunnels ! Who said you’re choosing the best ones ? By keeping free time to explore other, seemingly unrelated, topics you might discover opportunities that will remove a lot of the work to get to your final destination !

Continuously Deliver a Rails App to Your DigitalOcean Box Using Docker

I decided to use my latest side project as an occasion to learn Docker. I first used Heroku as a platform for deployment (see previous post). It works fine but I discovered the following shortcomings :

  • Heroku does not deploy with Docker, which means that I’d get quite different configurations between dev and prod, which is one of the promises of Docker :(
  • The dockerfile provided by docker runs bundle install in a directory outside of the docker main shared volume, this forces to do bundle update twice (once to update Gemfile.lock and a second time to update the actual gems …)

None of these issues could be fixed without moving away from Heroku.

A great Tutorial / Guide

I followed Chris Stump’s great tutorials to setup Docker for my app, to continuously integrate on CircleCI and to continuously deploy on a private virtual server on DigitalOcean.

The first 2 steps (Docker & CI) worked really out of the box after following the tutorial. Dealing with step 3 (CD) was a bit more complicated, because of :

  1. the specificities of DigitalOcean
  2. the fact that I’m a no deployment expert …

What did I need to do to make it work

Setup SSH on the DigitalOcean box

I started by creating a one-click DigitalOcean box with Docker pre-installed. That’s the moment where I had to setup SSH in order to make CircleCI deploy to my box. DigitalOcean has a guide for this, but here is how I did it :

  1. Create a special user on my dev machine adduser digitaloceanssh
  2. Log as this user su digitaloceanssh, and generated ssh keys for it ssh-keygen
  3. Print the public key cat ~/.ssh/id_rsa.pub and copy paste it in your DigitalOcean box setup
  4. Print the private key cat ~/.ssh/id_rsa and copy past it in your circle-ci job ssh keys

The benefit of this is that you should now be able to ssh in your DigitalOcean box from your digitaloceanssh user ssh root@<ip.to.digital.ocean>

Optional : update the box

The first time I logged into my box, I noted that packages were out of date. If you need it, updating the packages is a simple matter of apt-get update && apt-get upgrade

Fix deployment directory

By default, the home dir of the root user on the DigitalOcean box is /root/. Unfortunately, Chris Stump’s tutorial assumes it to be /home/root/. In order to fix that, I ssh-ed in the box and created a symbolic link : ln -s /root /home/root.

Install docker-compose on the box

Chris Stump’s tutorial expects docker-compose on the deployment box, but DigitalOcean only installs Docker on its boxes … Install instructions for docker-compose can be found here. Don’t use the container option, it does not inherit environment variables, and will fail the deployment, just use the first curl based alternative.

Warning : replace ALL dockerexample

This comes as an evidence, but be sure to replace all the references to ‘dockerexample’ to your own app name in all of Chris Stump’s templates (I forgot some and lost a few rebuilds for that)

Create the production DB

Chris Stump’s deployment script works with an existing production DB. The first migration will fail. To fix this, just do the following :

  1. ssh into the DigitalOcean server
  2. run DEPLOY_TAG=<latest_deploy_tag> RAILS_ENV=production docker-compose -f docker-compose.production.yml run app bundle exec rake db:create

You can find the latest DEPLOY_TAG from the CircleCi step bundle exec rake docker:deploy

How to access the logs

It might come handy to check the logs of your production server ! Here is how to do this :

  1. ssh in your production server
  2. run the following to tail on the logs DEPLOY_TAG=`cat deploy.tag` RAILS_ENV=production docker-compose -f docker-compose.production.yml run app tail -f log/production.log

Obviously, tail is just an example, use anything else at your convenience.

Generate a secret token

Eventually, the build and deployment job succeeded … I had still one last error when I tried to access the web site : An unhandled lowlevel error occurred. The application logs may have details.. After some googling, I understood that this error occurs when you did not set a secret key base for your rails app (details). There is a rails task to generate a token, all that was needed was to create a .env file on the server with the following :

SECRET_KEY_BASE=<GENERATED_SECRET...>

What’s next ?

Obviously, I learned quite a lot with this Docker exploration. I am still in the discovery phase, but my planning poker side project is now continuously built on circleci, and deployed to a DigitalOcean box.

The next steps (first, find a better subdomain, second, speed up the build job) will tell me if this kind of deployment is what I need for my pet projects. If it turns out too complicated or too difficult to maintain, Dokku is on my radar.

ReXP : Remote eXtreme Programming

My colleague Ahmad from Beirut gave a talk at Agile Tour Beirut on Saturday about how we adopted XP to a distributed team at work. I gave him a hand and played the remote guy during the talk.

With me through Skype, we did a first demo of remote pair programming on FizzBuzz using IDEA and Floobits

We then did a demo of remote retrospectives using Trello

When should I use ReXP

The conclusion is that :

  • If people are spread over 2 or a few cities, and that they are enough at every place to build a teams, just build different teams at every location
  • If people are spread over a lot of places, maybe involving many time zones, then the open source, pull request based work-flow seems the best
  • Otherwise, if there are not enough people to build 2 teams, that they are spread over only a few locations, that the time difference is not too big, then stretching XP to Remote will work best

As it is said that “nothing beats XP for small collocated teams”, I guess “nothing beats ReXP for small almost collocated teams”.

Tools to make it better

As Ahmad said in his talk, tools already exist. We could add that more would be welcome :

  • Floobits or Saros help tremendously for remote pairing, but maybe cloud based IDEs like Eclipse Che or Cloud 9 will make all these tools useless !
  • Trello works well for remote retrospectives, but some great activities like the 5 whys are still difficult to do with Trello. I’m sure new tools could do better.
  • I’m currently building a remote planning poker app
  • My other colleague Morgan wants to build a virtual stand up token to make it flow

Finally, here are the slides :

EDIT 2016/11/23 : the full video is now on YouTube

3 More Great Talks From JavaOne 2016

After the top 5 talks I attended at JavaOne here are more !

Managing Open Source Contributions in Large Organizations

James Ward

This talk was very interesting for companies or organizations that want to use Open Source in some way without ignoring the risks.

After an introduction listing the benefits of contributing to open source, James explained the associated risks :

  • Security (evil contributions or information leaks)
  • Quality (bad contributions, increased maintenance or showing a bad image)
  • Legal (responsibility in case of patent infringing contribution, ownership of a contribution, licenses)

He then explained that there are 3 ways to deal with the issue :

Strategy Description Pros Cons Popularity Examples
Do nothing Devs just contribute without saying it Easy, Gets it done Need to stay under the radar, Risks for all parties are ignores +++++ Most open source code on Github is shared in this manner |
Join a foundation Joining an existing open source foundation, with a framework Everything out of the box (infra, governance), builds trust Rules can be heavy, Ownership is given to the foundation +++ Linkedin put Kafka in the Apache Foundation |
Build tools Use your own tools to mitigate the main risks associated with the ‘Do nothing’ strategy Built on top of Github, Keep control, Keeps things easy Need to develop, test and operate the tools + Demo of a tool plugged into Github to enforce a contributor license agreement for anyone pushing a pull request |

The ‘build tools’ strategy looks promising, even if it is not yet widely used !

Here are the talk and the slides on the authors website.

Java Performance Analysis in Linux with Flame Graphs

Brendan Gregg

This is what a flame graph looks like :

Technically, it’s just an SVG with some Javascript. It shows the performance big picture. It aggregates data from Linux and JVM profilers. Vertically, you can see the call stacks in your system. The larger a block, the more time is taken inside a function (or in a sub call). The top border is where the CPU time is actually taken. If you want to speed up your system, speed up the wider zones at the top of the graph.

At Netflix, the speaker is a performance engineer, and his job is to build tools to help other teams discover performance issues. This is how they use Flame Graphs :

  • Compare 2 flame graphs at different times to see what changed
  • Do a canary release and compare the new flame graph before finishing the deployment
  • Taking continuous flame graphs on running services helps identify JVM behavior like JIT or GC
  • They use different color themes to highlight different things
  • They also use them to identify CPU cache misses

By the way, I also thought this was a great example of using an innovative visualization to manage tons of data.

I could find neither the video nor the slides of the talk, but I managed to find a lot of others talks about Flame Graphs, as well as extra material on the speaker’s homepage.

Increasing Code Quality with Gamification

Alexander Chatzizacharias

You might be wondering why we should care about gamification ?

  • Worldwide 11.2 billion hours are spent playing every week !
  • People love to play because it makes them feel awesome
  • Games are good teachers
  • At work we are the ones who need to make others successful
  • But only 32% of workers are engaged in their work !

Games rely on 4 main dynamics :

  • Competition (be very careful of closed economics which can be very bad for teams)
  • Peer pressure (Public stats push teams and individual to conform to the norm)
  • Progression (regular recognition of new skills is motivating)
  • Rewards (Badges, Level ups, Monkey Money, real money …)

He went on to demonstrate two games that are based on Jenkins and Sonar that aim at better code quality :

  • One mobile app developed during a 24h Hackathon at CGI which might be open sourced at some point
  • Another one called ‘Dev Cube’ created at an university, where you get to decorate you virtual cubicle

At the end of the talk, he gave the following recommendations :

  • Understand the needs of all to respond to everyone’s personal goals
  • Don’t assign things to do, that’s not fun, give rewards instead
  • Keep managers out of the picture
  • To keep it going, you need regular improvements, special events and new rules
  • KISS !

Playing at work might not be unproductive in the end !

The same talk given at NLJug, unfortunately, it’s in Dutch. English slides are here though.

Top 5 Talks I Attended at JavaOne 2016 (Part 2)

This is my second post relating the talks I attended at JavaOne 2016. Here is the beginning of the story. Here we go.

Euphoria Despite the Despair

Holly Cummins

Our jobs aren’t always fun … and that’s in fact an issue ! Studies show that people who have fun at work are 31% more productive ! The talk was organized in 3 parts :

  1. What is fun ?
  2. How to remove the parts that are not fun ?
  3. How to add even more fun ?

She defined what she called the funtinuum, which is that fun is a function of engagement and interaction. Basically, you won’t have fun if you are doing nothing, or if no one cares about your work. That aligns well with Daniel Pink’s drivers of motivation : Autonomy, Mastery and Purpose.

If something is not fun, it’s because it does not require engagement or interaction. It’s either boring or no one cares, or both. If that’s the case, it’s probably some kind of waste in some sense … Removing un-fun activities would mean removing waste. It’s interesting to note how this sounds like lean Muda) ! She gave examples such as :

  • automate stuff
  • pair programming transforms criticism into collaboration (bonus: it gives excuse to skip meetings)
  • go #NoEstimates because estimating is painful and useless
  • YAGNI defers useless things until they really add value
  • Organize to skip meetings and other boring stuff

Last step is to add fun to the workplace. She warned that adding fun before removing the un-fun stuff would feel fake and would make things worse …

To add fun, she suggested using things like :

  • gamification (there was actually another great talk about gamification)
  • build a hacking contest instead of a security training
  • Install a Siren of Shame for whoever breaks the build

Here are the slides

Java 9: The Quest for Very Large Heaps

Bernard Traversat, Antoine Chambille

This talk might not be of interest for all, but it is for us at work. It went through the improvement to come to Java 9’s G1 garbage collector. To summarize, to scale to very large heaps, it will split the memory into regions. Objects should be allocated on different regions depending on their specificities, which might help to build NUMA aware applications. Having the heap split up in smaller chunks enables the GC to run in parallel, which can speed up the old generation GC by up to 50 times !

Java 9 is scheduled for march 2017

Agility and Robustness: Clojure + spec

Stuart Halloway

I haven’t been touching Clojure for a while but I gave the language a try a few years ago. I had heard about Clojure spec but hadn’t taken the time to look at it in details. As I understood it all, Spec is like some sort of Design by Contract on steroïds ! Clojure is not statically typed, but you can now assign spec metadata to values. A spec is roughly a predicate. By defining specs for the inputs and outputs of functions, it is possible to verify at runtime that the function is behaving correctly.

As did Bertrand Meyer in the classic OOSC2, who advised to use contracts during development only, Stuart explained that we should care about developer vs production time instead of compile vs runtime. From this point of view, it is not of great importance whether the compiler or the continuously running test suite finds an issue.

But specs are a lot more than predicates ! They can be used to :

  • enable assertions at runtime
  • generate documentation
  • generate test cases
  • generate precise call logs
  • get precise error messages
  • explore a function and see how it can be called

He went on to compare the virtues of Clojure spec with static typing (à la Java) and example based testing :

Although I don’t believe that generative testing can ever replace example based testing altogether, it certainly can help.

All in all, the presentation was insanely great and engaging. It made me seriously think of going into Clojure programming again !

Here are the slides and the the same talk at Strangeloop

Conclusion

Overall, JavaOne was great ! If I had the opportunity, I’d go back every year ! There was a lot of other great talks I did not write about in these 2 posts, for example :

  • Development Horror Stories was a lot of fun, especially the winning story !
  • Hacking Hiring was full of good advises
  • Managing Open Source Contributions in Large Organizations was full of good ideas
  • Increasing Code Quality with Gamification was very inspiring

Edit 17 October 2016

I summarized 3 others JavaOne talks here.

Top 5 Talks I Attended at JavaOne 2016 (Part 1)

With a few other colleagues, I had the chance to be sent to San Francisco last week to attend the JavaOne 2016 conferences by my company.

Here is super short list of the conferences I attended which I found really interesting

Preventing errors before they happen

Werner Dietl & Michael Ernst

Since Java 6, it is possible to pass custom annotation processors to javac. Since Java 8, it is possible to add annotations to types. The guys behind the Checker Framework used this to create custom pluggable type systems for your Java programs. These type systems enforce properties on your program, and will emit warnings or errors at compile time otherwise.

Here are a few example :

  • declare @Immutable MyObject myObject to make sure that myObject won’t be muted
  • declare @NonNull MyObject myObject to make sure that myObject is never null

Under the hood, the compiler behaves as if @Immutable MyObject and MyObject where completely separate types, and it knows and tracks specific ways of converting between the two. The framework provides a simple API to define your own type systems. They did a live demo showing how to quickly define things like @Regex String, @Encrypted String or @Untainted String (which forbids user input strings to avoid SQL injections).

The talk was really interesting, the framework seems lightweight and to integrate well with the typical tool stack. I definitely will give it a try the next time I have a bit of slack time.

Here are the slides and a previous session of the presentation

Keeping Your CI/CD Pipeline as Fast as It Needs to Be

Abraham Marin-Perez

Continuous Delivery and Microservices are what you need to do, aren’t they ? Well, when actually trying to setup a CI / CD pipeline for all your code, things quickly get complicated pretty fast ! The speaker presented how to deal with this complexity by using metrics from your VCS and build servers to draw an annotated graph of your build pipeline.

  • He used the build time to set the size of every node : the longer, the larger
  • The color for the change rate : the more often it was built the warmer the color

It was then possible to determine other metrics such as :

  • the impact time of every node : build time + build time of all the dependencies
  • the weighted impact time : impact time * change rate
  • the overall average impact time : sum of all the weighted impact times
  • the overall max impact time : max of all the impact times

Using this and your SLAs it is possible to define policies for your build times such as “the max build time should not be more than X”. If you want to speed up your build, you can set a target build time and analyzing the graph should help you to understand what architecture changes you need to make to your system in order to meet this build time (this sounds a lot like Toyota’s Improvement Kata …)

I loved this talk ! I found the speaker captivating, he presented novel ideas which is not always the case.

Here are the slides, and the same presentation at Devoxx UK.

To Be Continued

I promised 5, and that’s only 2 talks ! Stay tuned, I’ll write about the 3 others in the coming weeks. Here they are.

Flavors of TDD

During the years doing some coding dojos with the same circle of people, I came up with my own style of practicing TDD. Lately, I had the chance to do a pair programming session with someone I did not know. That made me realize that they are in fact even more ways to practice TDD than I thought.

Mockist vs Classisist

A lot has already been written (and discussed) about these two approaches. I myself have already blogged about the subject, I even gave a talk about it. From my own point of view, I believe that the inconvenients of making mocking the default far outweights the benefits. I’m not saying that mocks aren’t useful from time to time, but rather that they should remain the exception.

Top-Down vs Bottom-Up

That’s the reason why I wrote this post. This is the main difference I found between my style and my pair’s. Let me explain.

Top-Down

Doing TDD top-down means starting with high level end to end tests, implementing the minimum to make it work, refactor and repeat. A bit like BDD, the point is to focus on the expected behavior, and avoid writing useless things. The bad point is that the refactoring part can get pretty difficult. On real life code, strictly following top-down would mean writing a feature test first, passing it with a quick and dirty implementation, to then spend hours trying to refactor all that mess … good luck !

Here is another example, from coding dojos this time. Having had success with the top-down approach during previous dojos, we once intentionally tried to code Conway’s Game of Life using a top-down approach. We did so by writing high level tests that were checking special patterns (gliders …). That was a nightmare ! It felt like trying to reverse engineer the rules of the game from real use cases. It did not bring us anywhere.

Bottom-Up

At the other side of the spectrum, you can do bottom-up TDD. This means unit testing and implementing all the small bricks you think you’ll need to provide the expected overall feature. The idea is to avoid tunnels and to get fast feedback on what you are coding. The bad side is that you might be coding something that will end up being unnecessary. Be careful, if you find yourself spending a lot of time building up utility classes, you might be doing too much bottom-up implementation.

The Numerals to Romans Kata is a good exercise to fail at bottom-up. Every time I did this exercise during a coding dojo, people new to it would start to come up with complicated ways to do it (often involving complex array manipulation). Compared to that, applying disciplined bottom-up TDD brings a brutally effective solution for Numerals to Romans.

Mixed approach

Both approaches have their pros and cons. I really believe developers who are serious about TDD should master both, and learn when to apply each. In fact, as often, the best approach lies somewhere in the middle. Here’s my recipe :

  1. Start with a high level feature test
  2. try to make it pass …
  3. … (usually) fail
  4. rollback or shelve your test and draft implementation
  5. build a brick
  6. unshelve
  7. try to make it pass …
  8. … and so one until the high level test finally passes.

In fact, it’s a lot like the Mikado Method for building features instead of refactoring.

Practice in dojos

It’s possible to intentionally practice this in coding dojos as well. Most kata should be OK, as long as the group agrees to fix it using this particular approach up front.

If during the dojo, you’ve just written a test, suddenly realize that it won’t be easy to get it passing, and that you’ve got the elements spread out in your code, this is the time ! Comment the test, get the green bar, refactor, uncomment the test, try to make it pass, repeat … Eventually, you’ll have all the bricks to make it easy to pass your test.

Some might say this is not ‘pure’ TDD, but that sounds like cargo cult to me ! As long as you make sure you are not building useless stuff, and that you keep the feedback loop as short as possible, you’re on the right track.