Automatic Travis Daily Build With Heroku Scheduler

As I just released auchandirect-scrAPI, and that it relies on scrapping, I needed a daily build.

The Travis team is already working on this, and I found a small utility app called TravisCron where anyone can register his repo for an automatic build.

Unfortunately, the feature is not yet ready in Travis, and the TravisCron guys did not yet activate my repo. After having a look at the TravisCron source code and the Travis API, I found out that it is really simple to do the same thing on my own.

That’s how I created daily-travis. It’s a tiny Rake task, ready to be pushed and automaticaly scheduled on heroku that will restart the latest build when run.

Details are in the README

@Travis : Thanks again for your service.

Auchandirect-ScrAPI : An Unofficial API Ruby Gem

Every brands should provide an API for developpers … unfortunately, it far from the truth right now. A few years ago, when I started my mes-courses.fr side project, I would have loved to find a french online grocery providing an open API. I had to resort to scrapping (that’s how I learnt that heavily relying on scrapping for a 15hr/week side project is not a good fit … but that’s another story).

As I am taking mes-courses.fr down, I have extracted the whole unofficial API I had built around http://www.auchandirect.fr (I’m talking to you french hackers !) into an open source Ruby Gem. Briefly :

  • It walks the whole store, from categories to items
  • Given valid credentials, it can fill and save a cart
  • It’s LGPL : anyone can use it as long as they give back any improvement to the community
  • It’s using Storexplore, another of my mes-courses.fr rip-off open source Ruby Gem
  • It’s tested on Travis and I’m currently trying to make it daily tested with Traviscron

There’s mainly one thing it cannot do :

  • It cannot procede to any payment or ordering

It’s available on Github

Happy scrapping !

Programming as an Exponential Problem

As said Tom Cargill

The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.

By extrapolation, this would mean that every time we increase the requirements by 10%, we need to double the total development time ! That would mean that solution complexity is an exponential function of the complexity of the problem.

That could explain why techniques that work well for small problems don’t work well at all for large problems, and vice et versa. For example

In the small (think one page script) In the large (think multi millions lines system)
Dynamic typing Static typing
Mutable globals Immutability
Imperative style Declarative style
Manual memory management Garbage collection
Shared memory Message passing

Just for fun, let’s suppose that we could deduce a unique constant C for every language such that

Here is a plot of this formula with different values of C (0.5, 1 and 2)

We can see that small values of C are best for small problems, whereas greater values are evolve better with larger problems. For a given problem, there is quite a difference in the solution complexity, if the formula was true, and that we knew in which zone of complexity our problem will always be, we could choose the appropriate technology ! Experienced engineers already have the gut knowledge about how to chose the right tool for the job !

That’s not all, let’s have a bird’s eye view of the same formulas

I increased the maximum problem complexity by a factor of 3, I had to multiply the solution complexity by 100 ! In the end, these exponential curves all seem frighteningly vertical. This could explain why the divide and conquer approach works so well in software : 2ex < e2x. Abstract and powerful APIs might be our best weapon against complexity.

People behaviour does not match this exponential hypothesis though :

  • At work, I’ve seen quite a few projects started from scratch, and everybody expecting it to maintain it’s initial speed during its whole lifetime
  • Some recent hiring or investing trend seem to rely on hackathons, startup week ends, or coding games, all ‘in the small’ exercises
  • I’ve observed in quick and dirty overtime work to meet a deadline … If productivity is proportional to the solution complexity, that crunch mode would be completely unproductive

This leads to more interesting questions :

  • Is my exponential model complete garbage ?
  • Or are humans particularly bad at forecasting an exponential behaviour ?
  • If so, what practices could we adopt to stop relying on this misleading gut feeling ?

Retroactively Add Keywords for Your Existing Octopress Posts

At the moment, I am exploring the world of SEO, and so I thought I could start with my blog. I found SEO for Octopress websites that I followed to add keywords and descriptions to this blog.

To fill actual keywords for all my existing posts, I had 2 options :

  • edit around 60 posts by hand
  • write a script to parse Yaml post descriptions and extract and inject keywords

Sorry, I chose the geeky solution …

Just add this code to your toplevel Rakefile, and run bundle exec rake add_keywords and keywords will be added to your existing posts.

My Humble Advices About How to Write Maintainable Tests

I love writing automated tests … or rather, I hate having to work in untested code. I find it makes my life unnecessarily stressful. On the other hand, the cost of maintaining badly written tests can sometimes outweigh their benefits. This is usually the moment where the team resorts to manual testing, and gets back to the ways of ‘the good old days’. Personally, I don’t like the good old days when we had to stay up all night to add even more mess to fix something for an important deadline.

Here is how I try to make my tests as maintainable as possible :

  • Write the tests before the code, it gets a (short) time to get used to, but after that, it’s just a lot more fun. Just try it for a while
  • Write tests with no side effects, otherwise, it will not be possible to run your tests alone, or in a different order ! So don’t use globals
  • Write readable tests : did you ever had to fix a test that you cannot figure out the intent ? A lot of the other points just help writing more readable tests
  • Write small tests : they are usually faster to run, allow you to test more edge cases, and make a better job at pinpointing the actual faulty code. The receipe for writing short tests is to follow the given-when-then pattern :
    • start your test by setting the context (given)
    • do the thing you actually want to test (when)
    • verify that it did what you wanted (then)
  • Remove code duplication from your tests, in the same way as you would in production code. This will help you when you’ll want to modify that constructor that is used in 764 test files …
  • Use test data builders. This will avoid duplicated and long context setup at the beginning of every test. Don’t use factory methods or the object mother pattern, it just does not scale. In java, this usually means rollying your own, in ruby, just use Factory Girl
  • Use custom assertion objects. This will avoid duplicated and complicated verification code at the end of every test. It will also help to improve assertion messages. In ruby, this comes built in rspec and its matcher dsl. Lately, in Java I have been using Fest Assert
  • Use the extended red –> red with explicit error message –> green –> refactor in place of the shorter red –> green –> refactor. By spending some time to improve your assertion messages, you’ll eventually save time to understand what broke when the test fails
  • As I already wrote about, only use mocks to
    • speed up a test that is too slow
    • cut off a dependency to an unavailable subsystem
    • simplify an overly complex test setup
  • Use constructor based dependency injection. It’s straightforward, low tech, and simplifies test setup
  • As there is no need to mock immutable data structures, I found that using immutable classes for values simplifies tests
  • Usually use hand coded mocks. Hand coded mocks become difficult to maintain when the code becomes too tangled, they can help me to know that I am doing something wrong (not mocking at the correct place, testing implementation, not doing enough refactoring …). On the contrary, mock framework make this so easy that I usually miss the issue completely until it is too late
  • Except when your mocking framework provides object proxing and automatic unmocking. Ruby’s RR provides this. Both features can be really useful when needed.
  • Use existing mocks when possible. For example, Sqlite in memory database : it speeds up the tests, removes the need for any environment setup, and is usually very simple to setup
  • Last of all, listen to your tests : if they get difficult to write, they might be a design improvement opportunity lying somewhere

All in all there is nothing new here. A lot of things come from GOOS others from Clean Code, the mocking ‘requirements’ come from an article from Gregory Brown, I found others from my own experience and from a lot of other sources I cannot remember now …

Happy testing !

Online Store Scrapping DSL Gem

Since I decided to stop Mes Courses to focus on AgileAvatars, I have been extracting open source gems from the code base. The last one is Storexplore : a declarative scrapping DSL that lets one define directory like apis to an online store.

As explained in the Readme, it allows one to declare a store this way :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Storexplore::define_api 'dummy-store.com' do

  categories 'a.category' do
    attributes do
      { :name => page.get_one("h1").content }
    end

    categories 'a.category' do
      attributes do
        { :name => page.get_one("h1").content }
      end

      items 'a.item' do
        attributes do
          {
            :name => page.get_one('h1').content,
            :brand => page.get_one('#brand').content,
            :price => page.get_one('#price').content.to_f,
            :image => page.get_one('#image').content,
            :remote_id => page.get_one('#remote_id').content
          }
        end
      end
    end
  end
end

And to use it like that :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Api.browse('http://www.dummy-store.com').categories.each do |category|

  puts "category: #{category.title}"
  puts "attributes: #{category.attributes}"

  category.categories.each do |sub_category|

    puts "  category: #{sub_category.title}"
    puts "  attributes: #{sub_category.attributes}"

    sub_category.items.each do |item|

      puts "    item: #{item.title}"
      puts "    attributes: #{item.attributes}"

    end
  end
end

I tried my best to make this library high quality :

  • The code evolved from a simple procedural script to a dsl through constant refactorings
  • Real world features like constant memory usage have been added to fix production bugs
  • Documented with samples and rdoc
  • Extensive automated testing

Let’s hope it will be usefull for some.

Sprints Are Not Sprints

I really don’t know why Scrum Sprints are called sprints ! From my experience, the number one mistake made by team starting with Scrum is to work as quickly and dirty as possible to complete the sprint, forgetting the sustainable pace.

Finding another word is difficult though, I thought of ‘stage’ or ‘milestone’ that both convey the long run idea, but both feel more content than time bounded. A more exotic word could be a ‘Scrum push’, it conveys slow and intense action action rather than quick results.

Overall, the traditional agile ‘iteration’ is not bad at all, at least a lot better than Sprint.

EDIT 01/08/2014:

The ‘Quick and Dirty’ Sprint strategy, is like trying to win a marathon with a greedy algorithm:

1
2
3
while not finished
  sprint(100m)
end

Not likely to work … Marathoners know that they’ve got to stick to a constant speed during the whole race in order to finish it. The way to get faster is to :

  • increase this cruise speed just a bit
  • get at ease with it during a few races
  • repeat

Is there something to learn from this to improve software development speed ?

Trying to Explain Monads in Java

A few days ago, a colleague currently taking the coursera course about reactive programming in scala, asked me to explain him what monads are. It’s always a tough question, and I rarely manage to give un understandable answer simply. This time though, I kind of managed to pass him some understanding of monads :

  1. When modelising a statefull data structure with immutable constructs, one has to pass in and return the data structure state of every function
  2. This results in a lot repeated code to pass this state around
  3. With a monad, you can factorize this glue code and only write the ‘real’ code

I thought it might be a good subject for a java kata ! This is what I tried to do in java-monads-kata. Here is some sample monadic code from the kata itself :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Test public void
pops_objects_in_reverse_push_order() {
  monad = monad.
    bind(push(A)).
    bind(push(B)).

    bind(pop());

  assertEquals(B, monad.value);

  monad = monad.bind(pop());
  assertEquals(A, monad.value);

  assertEquals(empty(), monad.stack);
}

You can have a look at all the final code, or go through the whole history to get the ‘kata’ feeling. It’s a shame Github does not offer a nice chronological repo history slideshow, for better experience, I recommend that you use Chrome with Github improved : this allows to view diffs right from the Github history.

The resulting code is still quite far from a Haskell monad :

  • Functions are not first class objects in Java. It is written in Java 7, without lambda, which does not help neither.
  • Java does not have Haskell type class polymorphism, but only supports OO polymorphism and very little covariance
  • The whole monad thing, designed at simulating/isolating side effects has a WTF feel in Java where side effects are just everywhere

I’d love to hear some feedback about it.

Reviews for Everyone

We are using Scrum at work. As an eXtreme Programmer to the bones, I wanted more collective code ownsership. We were already doing some pair programming from time to time, but I thought it might be a good time to try public code reviews.

I have already been doing code reviews in other jobs, but the experience has been disappointing up till now for the following reasons :

  • Even with review tools, they involved too much manual efforts
  • I’ve been in jobs with reinforcing loops : In this kind of environment, even mandatory code reviews tend to become a useless “tick in the box” operation
  • Often they are used as a control mechanism rather than a share tool

Public reviews, as discribed by Karl Fogel in Producing Free Open Source Software on the other hand seem something great to encourage share and peer review. The principles are simple :

  • All commits must be reviewed by at least one person
  • Anyone can review anyone’s code

At work, we are using Perforce and Code Collaborator as a review tool. We did not have the possibility to send an email at every perforce submit, and manually creating code collaborator reviews for every change is a chore. I spent an afternoon writing a small ruby script that polls perforce for new changes, and automaticaly creates reviews in Code Collaborator from these. I also added something to spot existing Jira ids in commit messages, to enlarge the existing review instead of creating a new one for every commit.

We are very pleased with the result, all the team is participating to the reviews. As with all good code reviews, it’s helps :

  • Spoting some bugs
  • Spoting some possible design improvements
  • Discussing the global design and architechure of our system
  • Gathering coding standards