Monthly Archives: November 2018

See how we create books. Live

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

We are in the middle of a process of converting our bestselling book Rails Meets React from CoffeeScript to ES6. This book turned out to be a huge help for many Rails developers seeking a sane way to create their front end applications.

Many people are not using CoffeeScript and could use the information from the book. This is our main request for improving this book, we got from you all. And we are providing this upgrade to all people, who will ever buy Rails Meets React

Working on a book on a live stream

The process of rewriting the content may be interesting for people wanting to convert their React codebase from CoffeeScript to ES6. So, I thought it may be a good time to experiment with video streaming.

There is one thing I like about gaming live streams. The streamer has a chance to interact with the audience in a real-time. There is a chat where everyone can ask a question. This is a great way to connect with an audience. That makes me really interested in trying this out.

I started to think about how exactly the whole stream could look like and I got an idea.

Book creation process at Arkency – Rails Meets React

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

I want to show you the part of our book-creation process. The tooling, the automation, the content creation. We spent a bit of time on making whole technical process around our books quick and easy. We don’t want to keep it only to ourselves.

Here’s the plan of things I want to show:

  • How do we structure our books?
  • How does our book building process look like?
  • The process of converting a piece of chapter from CoffeeScript to ES6 (with JSX)

The stream will start at 2pm UTC this Friday. You can watch it on YouTube. I will answer your questions on chat, during the event.

This is our first attempt to making this kind of live streaming event. And I hope it won’t be the last! 🙂 There are much more things we would love to share with you!

And yes! The stream will be saved and published later on our YouTube channel.

Psst… and there will be a special offer for people watching the stream. 😉

Porting a Rails frontend from CoffeeScript to ES6 and JSX – examples

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

We are working on a new version of our bestseller book – Rails meets React.js. Currently, the book is based in CoffeeScript and we want to port it to ES6. The update will be free for everyone, who bought the CoffeeScript version

Here are some examples showing a process of porting from CoffeeScript to ES6 and JSX.

JSX

In our previous version of this book, we didn’t use JSX because it does not fit well with CoffeScript. With ES6 it is different, JSX fits here very well.

Before:

  Stats = React.createClass     contextTypes:       user: React.PropTypes.object     render: ->       React.DOM.div null,         "Won: #{@context.user.won}"         React.DOM.br null         "Lost: #{@context.user.lost}"         React.DOM.br null         React.DOM.a           href: "http://www.example.org/stats/#{@context.user.id}",           "(see the whole stats of #{@context.user.name})"    stats = React.createFactory(Stats) 

After:

class Stats extends React.Component {   render() {     return (       <div>         Won {this.context.user.won}         <br />         Lost {this.context.user.lost}         <br />         <a href={`http://www.example.org/stats/${this.context.user.id}`} >           (see the whole stats of {this.context.user.name})         </a>       </div>     );   } } 

Note ES6 template literal `http://www.example.org/stats/${this.context.user.id}` which we are using to construct string url.

ES6 classes

Instead of using React.createClass we are now using ES6 class syntax:

Before:

OneTimeClickLink = React.createClass   render: ->     React.DOM.div(       {id: "one-time-click-link"},       React.DOM.a(         {href:"javascript:void(0)"},         "Click me"       )     ) 

After:

class OneTimeClickLink extends React.Component {   render() {     return (<div id="one-time-click-link">       <a href="javascript:void(0)">         Click me       </a>     </div>);   } 

State initialization

Since getInitialState does not work with classes syntax, we are initializing state in a constructor. Like in this example:

Before:

DateWithLabel = React.createClass   getInitialState: ->     date: new Date()   render: ->     DOM.div 

After:

class DateWithLabel extends React.Component {   constructor(props) {     super(props);     this.state = {       date: new Date()     };   }    render() {     return (<div></div>);   } } 

Bind instance methods

In class syntax, React doesn’t bind all methods automatically. So, we are binding them in our constructor.

Before:

  OnOffCheckbox = React.createClass     getInitialState: ->       toggled: false      toggle: ->       @setState toggled: !@state.toggled      render: ->       React.DOM.input         key: 'checkbox'         type: 'checkbox'         id: @props.id         checked: @state.toggled         onChange: @toggle  

After:

  class OnOffCheckbox extends React.Component {     constructor(props) {       super(props);       this.state = {         toggled: false       };       this.toggle = this.toggle.bind(this);     }      toggle() {       this.setState({ toggled: !this.state.toggled });     }      render() {       return (         <input           key="checkbox"           type="checkbox"           id={this.props.id}           checked={this.state.toggled}           onChange={this.toggle}         />       );     }   }  

Default props and validations

defaultProps , propTypes and contextTypes must be defined outside the class body. Here are some examples:

Before:

  OnOffCheckbox = React.createClass     getDefaultProps: ->       initiallyToggled: false      getInitialState: ->       toggled: @props.initiallyToggled      toggle: ->       @setState toggled: !@state.toggled      render: ->       React.DOM.input         key: 'checkbox'         type: 'checkbox'         id: @props.id         checked: @state.toggled         onChange: @toggle 

After:

  class OnOffCheckbox extends React.Component {     constructor(props) {       super(props);       this.state = {         toggled: props.initiallyToggled       };       this.toggle = this.toggle.bind(this);     }      toggle() {       this.setState({ toggled: !this.state.toggled });     }      render() {       return (         <input           key="checkbox"           type="checkbox"           id={this.props.id}           checked={this.state.toggled}           onChange={this.toggle}         />       );     }   }   OnOffCheckboxWithLabel.defaultProps = {     initiallyToggled: false   };  

The same goes for propTypes:

Before:

Blogpost = React.createClass   propTypes:     name: React.PropTypes.string.isRequired    # ... 

After:

class Blogpost extends React.Component {   // ... } Blogpost.propTypes = {   name: React.PropTypes.string.isRequired }; 

Summary

Code written in ES6 with JSX can be pretty clean. I’ve used ESLint which helped me to keep syntax clean and free of errors. Here you can find a Blogpost how correctly configure it for your editor.

If you are interested in learning how to use React with Rails, with the ES6 syntax, we are working on a new version of Rails meets React.js. It will be free for everyone who bought previous CoffeeScript version.

One more step to DDD in a legacy rails app

One more step to DDD in a legacy rails app

Recently I picked up a ticket from support team of one of our clients. Few months ago VAT rates have changed in Norway – 5% became 10% and 12% became 15%. It has some implications to platform users — event organizers, since they can choose which VAT rate applies to products which they offer to the ticket buyers. You’ll learn why I haven’t just updated db column.

Current state of the app

This app is a great example of a legacy software. It’s successful, earns a lot of money, but have some areas of code which haven’t been cleaned yet. There’s a concept of an organization in codebase, which represents the given country market. The organization has an attribute called available_vat_rates which is simply a serialized attribute, keeping VatRate value objects. I won’t focus on this object here, since its implementation is not a point of this post. It works in a really simple manner:

vat_rate = VatRate.new(15) # => #<VatRate:0x007fdb8ed50db0 @value=15.0, @code="15">  vat_rate.code # => "15"  vat_rate.to_d => #<BigDecimal:7fdb8ed716c8,'0.15E2',9(27)> 

VatRate objects are Comparable so you can easily sort them; pretty neat solution.

Event organizer, who creates eg. a ticket, can choose a valid VAT rate applying to his product. Then, after purchase is made, ticket buyer receives the e-mail with a receipt. This has also side-effects in the financial reporting, obviously.

So what’s the problem?

I could simply write a migration and add new VAT rates, remove old ones and update events’ products which use old rates. However, no domain knowledge would be handed down about when change was made and what kind of change happened. You simply can’t get that information from updated_at column in your database. We have nice domain facts “coverage” around event concept in the application, so we’re well informed here. We don’t have such knowledge in regard to the Organization.

Start with a plan

I simply started with making a plan of this upgrade.

  1. I’ve checked if the change made to available_vat_rates will be represented properly in the financial reports.
  2. I’ve checked how many products were having old VAT rates set.
  3. I’ve introduced new domain events called Organization::VatRatedAdded and Organization::VatRateRemoved which are published to the Organization$organization_id stream.
  4. I’ve run a migration, which was adding new VAT rates (10% & 15%) and publishing sufficient domain facts — let’s call it step 1.
  5. I’ve performed an upgrade of the VAT rates on the products which required it – step 2.
  6. I’ve run a migration, which has removed old VAT rates (5% & 12%) and published domain facts – step 3.

Step 1 – adding new VAT rates

require 'event_store'  class AddNewVatRatesToNoOrgazation < ActiveRecord::Migration   #… minimum viable implementations of used classes    def up     event_store   = Rails.application.config.event_store      organization  = Organization.find_by(domain: 'me.no')     originator_id = User.find_by(email: 'me@me.no').id      organization.available_vat_rates = [         VatRate.new('NoVat'),         VatRate.new(5),  # deprecated one         VatRate.new(10), # new one         VatRate.new(12), # deprecated one         VatRate.new(15), # new one         VatRate.new(25),     ]      if organization.save       event_store.publish(Organization::VatRateAdded.new(         organization: organization.id,         vat_rate_code: 10,         originator_id: originator_id       )       event_store.publish(Organization::VatRateAdded.new(         organization: organization.id,         vat_rate_code: 15,         originator_id: originator_id       )     end   end end 

Two things worth notice happen here. Event data contain originator_id, I simply passed there my user_id. Just to leave other team members information about person who performed the change in the event store — audit log purpose. The second thing is that I leave old VAT rates still available. Just in case if any event organizer performing changes on his products, to prevent errors and partially migrated state.

Step 2 – migrating affected product data

The amount of products which required change of the VAT rates was so small that I simply used web interface to update them. Normally I would just go with baking EventService with UpdateTicketTypeCommand containing all the necessary data.

Step 3 – remove deprecated VAT rates

require 'event_store'  class RemoveOldVatRatesFromNoOrgazation < ActiveRecord::Migration   #… minimum viable implementations of used classes    def up     event_store   = Rails.application.config.event_store      organization  = Organization.find_by(domain: 'me.no')     originator_id = User.find_by(email: 'me@me.no').id      organization.available_vat_rates = [         VatRate.new('NoVat'),         VatRate.new(10),         VatRate.new(15),         VatRate.new(25),     ]      if organization.save       event_store.publish(Organization::VatRateRemoved.new(         organization: organization.id,         vat_rate_code: 5,         originator_id: originator_id       )       event_store.publish(Organization::VatRateRemoved.new(         organization: organization.id,         vat_rate_code: 12,         originator_id: originator_id       )     end   end end 

Summary

All the products on the platform have proper VAT rates set, organization has proper list of available VAT rates. And last, but not least, we know what and when exactly happened, we have better domain understanding, we started publishing events for another bounded context of our app. If you’re still not convinced to publishing domain events, please read Andrzej post on that topic or even better, by watching his keynote From Rails legacy to DDD performed on wroc_love.rb.

The story behind our Rails TDD & DDD video class

Get the Rails TDD video class now

I was asked how long the end result is going to be.

It’s hard to say, but I suppose I’m somewhere in the 33% of the planned scope. Which may mean, the end result will be about 3 hours. Actually, I have a lot of fun doing this, so there might be more 🙂 Don’t want to overpromise at the moment, though.

Some of the people who started watching the videos asked to describe more of the context of the start.

As it is right now, the videos start with requirements to implement. The requirements are given by the mysterious Jane. Who is she? What are we doing here as a programmer?

The story wasn’t really told in the videos yet. I postponed to later time, assuming that the technical content is more important. However, now I see that the story is also intruiging to many of you 🙂 That’s good.

First, let me explain, why a story here.

Why do we need some artificial setting (company, other developers) to be introduced while learning about TDD/DDD.

Well, it’s just me having fun, after taking a writing class. Yes, that’s right, over the last months, I took a writing class (as in writing fiction). Due to the lack of time, I failed at actually participating in the class by making the homeworks. However, I was in the read-only mode and learnt a lot about stories, narrations, introducing the context to the reader etc.

I’m still far to even be able to write a 2-pages fiction story, but I’m some steps forward in having fun while creating such narrations.

I wanted to make this class but I wanted to put it in an imaginary situation which would be close to the programmers learning from the class.

What can be closer than imagining ourselves as starting a new job as a programmer? That’s how it evolved. I imagined a company called BioRubyCorp. The main character was just hired to be a Ruby programmer. Jane is an existing developer at the company. She’s the developer who knows about all the stuff but she’s got too many things on her plate, so you (the narrating character played by me) were hired to help her.

At the beginning you don’t know much about the company itself, from the business perspective. You only know that they operate in the Genetics domain, which is a domain completely foreign to you. In fact, it’s a domain also foreign to me, so I’m just learning along the way. Frankly speaking, this is the surprising side-effect to me, that learning about DNA, nucloetides became a fascinating topic to me!

Why the Genetics domain was chosen? What’s the story behind it?

In order to tell the full story, I need to tell you about the ongoing Junior Rails Developer class that we’re running for the last 3 months. It’s an online, Slack-based class which has its roots in the Rails learning program which I developed over the 5 years of teaching Rails at University of Wroclaw (BTW, students of this university are regular medal-winning participants of team-programming contests in the world).

The first student, who was taken to the Junior Rails Developer class was Janek. One of the first exercises to do in the class are the ones taken from exercism.io. Some of them are about the DNA domain. During the class, we help the student. What struck me was that even our team was suggesting the solutions based on the shortest-code instead of a one which would be more domain-revealing. That made me contemplate what’s better in such exercism context. What also bothered me was that even though the requirements were written with a nice domain language, the code was just strings and lists. Primitive obsession! I wanted to see if there’s a way of implementing it with more abstractions. You can see the result in this free video:

I was quite happy with the result. I also became fascinated with this domain! So that’s one part of the story – why the Genetics domain.

Let’s come back to the setting of the narration.

There’s the main character – me/you and Jane. There’s also the (unnamed yet) domain expert who helps us with understanding the domain. This kind of narration helps me build a typical situation in a company. It builds a good foundation for the next requirements and the planned plot twists.

To be honest, the story is not fully prepared and planned. You can also influence it by suggesting me what kind of plot twist would be “typical” in such situations. Which requirement to implement? How to deal with deadlines? How to deal with an empty budget?

The story is just the framing, though.

The real goal of the class is to teach you how to combine Rails with TDD. Other things, like DDD and mutant are also crucial. I don’t want to show the only one and true way. I’m going to present more techniques over time. I like to think of this class as a TV series, which is easy to consume in short episodes every day.

I hope this gives you more context now 🙂 Buy here: https://vimeo.com/ondemand/arkencyruby . Imagine the first day at BioRubyCorp and start watching the videos!

Thanks, Andrzej

70% off the Rails / TDD / DDD / mutant video class until 11pm 19.05.2016 CEST

Hi,

Just 24 hours left for the 70% discount of the Rails/TDD/DDD/mutant video class. You can buy it here: https://vimeo.com/r/1I82/STcxRjFuel

Thanks to everyone who already bought the video class!

The number of sales is so overwhelming that Vimeo wrote this to me today:

70% off the Rails / TDD / DDD / mutant video class until 11pm 19.05.2016 CEST

Believe me, it’s really cool to receive such messages – thank you so much!

To celebrate the results, I’ve decided to make some of the videos FREE so that those of you who haven’t decided yet can have a look:

Q&A

Is the code available?

Many people asked me if the code from the videos will be also available.

Yes! It’s definitely going to be available. The minor blocker for now is that I’m not sure how to get emails of the customers (or any other way of contacting) so that I can collect the github logins and give you access to the Github repository. I’m talking to the Vimeo crew about it!

Is it going to be about Rails or just Ruby?

The first ~20 videos are not touching Rails yet. It’s by design 😉 The story will reveal itself to the point where we’ll switch to Rails in an interesting way 😉 The goal of this class is to show you how to do TDD with a Rails app!

Is it again something about JavaScript or it’s just Ruby/Rails this time?

I guess the people asking this question may miss our React.js love 😉 This time, we’ll save you the React.js or any other JavaScript things. No React.js this time 😉 (those of you who are disappointed, stay tuned, we’ll cooking something React/Rails related for you as well in another form).

Testimonials

Here’s some of the things people wrote to me after watching some of the videos:

  • “It’s a great idea with the class, I bought it, watched the first videos and already learnt a lot, thanks!”
  • “I´m really enjoying the course.”
  • “I am also learning cool tricks for Rubymine. Thank you.”
  • “Oglada sie bardzo przyjemnie. Fajny flow – nie za szybko i nie za wolno.” (which translates to – It was nice to watch it, good flow, not too fast, not too slow)

Decided to buy?

Just click this link: https://vimeo.com/r/1I82/STcxRjFuel

BTW, I’ve just checked with Vimeo. It all looks like we can keep our refund policy with this class as well. Just to remind you – all Arkency products (ebooks/videos) are on the policy of refunding at any time, without asking any question. If you buy this class and then decide it’s not for you, just send as an email and we’ll issue a refund, no questions asked!

Domain Events over Active Record Callbacks

Recently Marcin wrote an article about ActiveRecord callbacks being the biggest code smell in Rails apps, that can easily get out of control. It was posted on Reddit and a very interesting comment appeared there:

Imagine an important model, containing business vital data that changes very rarely, but does so due to automated business logic in a number of separate places in your controllers.

Now, you want to send some kind of alert/notification when this data changes (email, text message, entry in a different table, etc.), because it is a big, rare, change that several people should know about.

Do you:

A. Opt to allow the Model to send the email every time it is changed, thus encapsulating the core functionality of “notify when changes” to the actual place where the change occurs?

Or

B. Insert a separate call in every spot where you see a change of that specific Model file in your controllers?

I would opt for A, as it is a more robust solution, future-proof, and most to-the-point. It also reduces the risk of future programmer error, at the small cost of giving the model file one additional responsibility, which is notifying an external service when it changes.

The author brings very interesting and very good points to the table. I, myself, used a few months ago a callback just like that:

class Order < ActiveRecord::Base     after_commit do |order|       Resque.enqueue(IndexOrderJob,         order.id,         order.shop_id,         order.buyer_name,         order.buyer_email,         order.state,         order.created_at.utc.iso8601       )     end   end 

To schedule indexing in ElasticSearch database. It was the fastest solution to our problem. But I did it knowing that it does not bring us any further in terms of improving our codebase. But I knew that we were doing at the same time other things which would help us get rid of that code later.

So despite undeniable usefulness of those callbacks, let’s talk about a couple of problems with them.

They are not easy to get right

Imagine very similar code such as:

class Order < ActiveRecord::Base     after_save do |order|       Elasticsearch::Model.client.index(         id: id,          body: {           id:              id.to_s,           shop_id:         shop_id,           buyer_name:      buyer_name,           email:           buyer_email,           state:           state,           created_at:      created_at       })     end   end 

At first sight everything looks all right. However if the transaction gets rolled-back( saving Order can be part of a bigger transaction that you open manually)** **you would have indexed incorrect state in the second database. You can either live with that or switch to after_commit.

Also, what happens if we get an exception from Elastic. It would bubble up and rollback our DB transaction as well. You can think of it as a good thing (we won’t have inconsistent DBs, there is nothing in Elastic and there is nothing in SQL db) or a bad thing (error in the less important DB preventend someone from placing an order and us from earning money).

So let’s switch to after_commit which might be better suited to this particular needs. After all the documentation says:

These callbacks are useful for interacting with other systems since you will be guaranteed that the callback is only executed when the database is in a permanent state. For example after_commit is a good spot to put in a hook to clearing a cache since clearing it from within a transaction could trigger the cache to be regenerated before the database is updated

So in other words. after_commit is a safer choice if use those hook to integrate with 3rd party systems/APIs/DBs . after_save and after_update are good enough if the sideeffects are stored in SQL db as well.

class Order < ActiveRecord::Base     after_commit do |order|       Elasticsearch::Model.client.index(         id: id,          body: {           id:              id.to_s,           shop_id:         shop_id,           buyer_name:      buyer_name,           email:           buyer_email,           state:           state,           created_at:      created_at       })     end   end 

So we know to use after_commit. Now, probably most of our tests are transactional, meaning they are executed in a DB transaction because that is the fastest way to run them. Because of that those hooks won’t be fired in your tests. This can also be a good thing because you we bothered with a feature that might be only of interest to a very few test. Or a bad thing, if there are a lot of usecases in which you need those data stored in Elastic for testing. You will either have to switch to non-transactional way of running tests or use test_after_commit gem or upgrade to Rails 5.

Historically (read in legacy rails apps) exceptions from after_commit callbacks were swallowed and only logged in the logger, because what can you do when everything is already commited? But it’s been fixed since Rails 4.2, however your stacktrace might not be as good as you are used to.

So we know that most of the technical problems can be dealt with one way or the other and you need to be aware of them. The exceptions are what’s most problematic and you need to handle them somehow.

They increase coupling

Here is my gut feeling when it comes to Rails and most of its problems. There are not enough technical layers in it by default. We have views (not interesting at all in this discussion), controllers and models. So by default the only choice you have when you want to trigger a side-effect of our action is between controller and model. That’s where we can put our code into. Both have some problems.

If you put your sideffects (API calls, caching, 2nd DB integration, mailing) in controllers you might have problem with testing it properly. For two reasons. Controllers are tightly coupled with HTTP interface. So to trigger them you need to use the HTTP layer in tests to communicate with them. Instantiating your controllers and calling their methods is not easy directly in tests. They are managed by the framework.

If you put the sideeffects into your models, you end up with a different problem. It’s hard to test the domain models without those other integrations (obviously) because they are hardcoded there. So you must either live with slower tests or mock/stub them all the time in tests.

That’s why there are plenty of blog posts about Service Objects in Rails community. When the complexity of an app rises, people want a place to put after save effects like sending an email or notifying a 3rd party API about something interesting. In other communities and architectures those parts of code would be called Transaction Script or Appplication/Domain/Infrastructure Service. But by default we are missing them in Rails. That’s why everyone (who needs them) is re-inventing services based on blog posts or using gems (there are at least a few) or new frameworks (hanami, trailblazer) which don’t forget about this layer. You can read our Fearless Refactoring book to get knowledge how to start introducing them in your code without migrating to a new framework. It’s a great step before you start introducing more advanced concepts to your system.

They miss the intention

When your callback is called you know that the data changed but you don’t know why. Was the Order placed by the user. Was it placed by an POS operator which is a different process. Was it paid, refunded, cancelled? We don’t know. Or we do based on state attribute which in many cases is an antipattern as well. Sometimes it is not a problem that you don’t know this because you just send some data in your callback. Other times it can be problem.

Imagine that when User is registered via API call from from mobile or by using a different endpoint in a web browser we want to send a welcome email to them. Also when they join from Facebook. But not when they are imported to our system because a new merchant decided to move their business with their customers to our platform. In 3 situations out of 4 we want a given side effect (sending an email) and in one case we don’t want. It would be nice to know the intention of what happened to handle that. after_create is just not good enough.

Domain Events

What I recommend, instead of using Active Record callbacks, is publishing domain events such as UserRegisteredViaEmail, UserJoinedFromFacebook, UserImported, OrderPaid and so on… and having handlers subscribed to them which can react to what happened. You can use one the many PubSub gems for that (ie. whisper) or rails_event_store gem if you additionally want to have them saved on database and available for future inspection, debugging or logging.

If you want to know more about this approach you can now watch my talk: 2 years after the first domain event – the Saga pattern. I describe how we started publishing domain events and using them to trigger sideeffects. You can use that approach instead of AR callbacks.

After some time whenever something changes in your application you have event published and you don’t need to look for places changing given model, because you know all of them.

P.S.

It only gets worse in Rails 5

The anatomy of Domain Event

The anatomy of Domain Event

Almost 2 years and over 16 million domain events ago I’ve started a process of “switching the mindset”. I had no production experience with Event Sourcing (BTW it still is used only in some parts of the application, but that’s a topic for another post), I had only a limited experience with Domain Driven Design (mainly knowing the tactical patterns). During that time, a lot has changed.

Start from the middle

I’ve started introducing new concepts in our project’s code base from the middle. Not with Domain Driven Design, not with Event Sourcing or CQRS (Command Query Responsibility Segregation). It all has started by just publishing Domain Events.

Trials & errors

The more Domain Events we have “published” the better understanding of our domain I’ve got. Also, I’ve started better understand the core concepts of Domain Driven Design, the terms like bounded context, context map and other from strategic patterns of DDD started to have more sense and be more and more important. Of course, I’ve made a few mistakes, some of them still bite us because of decisions made almost 2 years ago 😉

Our first ever published domain event is:

RailsEventStore::Client.new.read_all_streams_forward(:head, 1) => ProductItemEvents::ProductItemSold      @event_id="74eb88c0-8b97-4f27-9234-ed390f72287c",      @metadata={:timestamp=>2014-11-12 22:20:24 UTC},      @data={:order_id=>23456, :product_item_id=>123456,             :attributes=>{                "id"=>123456, "order_id"=>23456, "product_type_id"=>98765,                "price"=>50, "barcode"=>"1234567890",                "scanned_at"=>nil, "serialized_type"=>nil,                "order_line_id"=>3456789, "code_id"=>nil,                "updated_at"=>2014-11-12 22:20:24 UTC, "created_at"=>2014-11-12 22:20:24 UTC}} 

Here is a set of rules / things to consider when you will build your domain events. Each of them is based on a mistake I’ve made 😉

Rule #1: the naming is hard, really hard

In my old talk I’ve presented at dev’s user group meetup I’ve defined domain event as:

  • Something that has had already happened, and therefore…
  • Should be named in past tense…
  • … and in business language (Ubiquitous Language)
  • Represents state change,
  • Something that will never change

The name of a domain event is extremely important. It is the “definition” of an event for others. It brings a lot of value when defined right, but it might be misleading when it won’t capture the exact business change.

In the example event above the name of the domain event is ProductItemSold. And this name is not the best one. The application domain is not selling some products but selling tickets for events (actually that’s huge simplification but it does not matter here). We do not sell products. We sell tickets. This domain event should be named TicketSold. Yeah, sure we could also sell some other products but then it should be a different domain event.

Rule #2: don’t be CRUDy

There are very few domains where something is really created. Every time I see a UserCreated domain event I feel that this is not the case. The user might be registered, the user might be imported, I don’t know the case when we really create a user (he or she exists already ;P). Don’t stop when your domain expert tells you that something is created (updated or deleted). It is usually something more, something that has real business meaning.

And one more thing: don’t talk CRUD to your domain expert / customer. When he will start to talk CRUD, you are in serious trouble.

You know what’s the biggest tragedy in software engineering?

The customers gave up and learnt to speak CRUD to developers.

— Andrzej Krzywda (@andrzejkrzywda) October 3, 2015

Check the blog post of Udi Dahan where he explains this in more details.

Rule #3: your event is not your entity

You might have spotted the attributes in the data argument of our first domain event. This is something I dislike most in that domain event. Why? Because it creates a coupling between our domain event & our database schema. Every kind of coupling is bad. Especially when you try to build a loosely coupled, event driven architecture for your application. The attributes of a domain event are their contract. It is not something you could and should easily change. There could be parts of the system that rely on that contract. Changing it is always a trouble. Avoid that by applying Rule #4.

Rule #4: be explicit

The serialized_type? It this a business language? Really? Or does the business care about the scanned_at when a ticket has been just sold? I don’t. And all event handlers for this event do not care. That’s just pure garbage. It holds no meaningful information here. It just messing with your domain event contract making it less usable, more complicated. Explicit definition of your domain event’s attributes will not only let you avoid those unintentional things in the domain event schema but will force you to think what really should be included in the event’s data.

=> TicketSold      @event_id="74eb88c0-8b97-4f27-9234-ed390f72287c",      @metadata={:timestamp=>2014-11-12 22:20:24 UTC},      @data={:barcode=>"1234567890",             :order_id=>23456, :order_line_id=>3456789,             :ticket_type_id=>98765,             :price=>{Price value object here}} 

The modified version of my first domain event. Much cleaner. All important data is explicit. Clearly defined contract what to expect. Maybe some more refactoring could be applied here (TicketType value object & OrderSummary value object that will encapsulate the ids of other aggregates). Also, important attribute here was revealed. The ticket’s barcode.

Rule #5: natural is better

With the explicit definition of domain event schema, it is easier to notice that we do not need to rely on database’s id of a ticket (product_item_id) because we already have a natural key to use – the barcode. Why is the natural better? Natural keys are part of the ubiquitous language, are the identifications of the objects you & your domain expert will understand and will use when you will talk about it. It also will be used in most cases on your application UI (if not you should rethink your user experience). When you want to print the ticket you use barcode as identification. When you validate the ticket on the venue entrance you scan the barcode. When some guest has troubles with his ticket your support team asks for the barcode (…or order number, or guest name if you doing it right 😉 ). The barcode is the identification of the ticket. The database record’s id is not. Don’t let you database leak through your domain.

Rule #6: time is a modelling factor

“Modelling events forces temporal focus”

Greg Young, DDD Europe 2016 talk

How things correlate over time, what happens when this happens before this becomes a real domain problem. Very often our understanding of domain is very naive. If you don’t include time as a modelling factor your model might not be reflecting what is happening in the real world.

Rule #7: when in doubt

TALK TO YOUR DOMAIN EXPERT / BUSINESS

Want to learn more?

Struggling with finding a direction of refactorings to keep your app maintainable? Legacy codebase driving you nuts and not revealing intention?

Join our Rails + Domain Driven Design Workshop. The next edition will be held on 12-13th January 2017 (Thursday & Friday), in Wrocław, Poland.

Packaging ruby programs in NixOS

Packaging ruby programs in NixOS

Recently at Arkency we’re exploring how NixOS could fit our infrastructure. From a few weeks we’ve switched most of our projects CI systems from CircleCI to Buildkite. Buildkite offers unique offer in which it is us who provide infrastructure and containers to test our applications so we’ve decided to setup these containers with NixOS and we are by far happy with this solution.

However in this post I would like to describe how to package a simple ruby program, based on simple password manager CLI utility – pws. I’ve chose such a simple program to reduce the scope of this post – I hope that in future I’ll be able to describe more complicated cases.

Nix & NixOS

Firstly, we should get to know how (without details) packaging process in NixOS looks like. Typical package in NixOS is a directory with a default.nix file (and maybe others). *.nix files are files written in Nix, which is functional programming language and package manager. A default.nix file is a function which takes configuration options and dependencies as an input and creates package as an output.

Just to recap the syntax of nix language, you should know a few things:

  • in nix we’re using records extensively. Records are what we know as Hash in ruby world and they look like this: { foo = "some value"; bar = 3; }
  • function application is very haskell-ish, and looks like this: name_of_the_function argument. Most often, argument is a record with many fields.
  • definition of function usually looks like this: {arg1, arg2, arg3}: body_of_the_function – which is essentialy a function with one record argument destructured into arg1, arg2, arg3. Think about ruby’s def my_function(**args).
  • nix is lazily evaluated

Simplest ruby package

Our program is distributed as a gem, so we can take an advantage of the fact that NixOS has a built-in bundlerEnv function for creating Bundler environments as packages. After checking out how other ruby programs’ packages look like, we can write following function:

{ bundlerEnv, ruby }:  bundlerEnv {   name = "pws-1.0.6";    ruby = ruby;    gemfile  = ./Gemfile;   lockfile = ./Gemfile.lock;   gemset   = ./gemset.nix; } 

As you can see, it’s a simple function. It’s argument, a record, has two fields: bundlerEnv and ruby. Body of this function does only one thing: it calls bundlerEnv function, with record as an argument. This record has fields: name, ruby, gemfile,lockfileandgemset`.

About fields from the arguments’ record: bundlerEnv is aforementioned function creating Bundler environment and ruby is… Ruby, a program. You’ve to get use to the fact that the programs are just values passed in as arguments to other functions.

There are few important things right here:

  • gemfile, lockfile and gemset are references to the files we haven’t defined yet. We’ll create them in a second.
  • ruby = ruby may firstly look funny, but think about scopes: it is a definition of the field ruby, which value is ruby program. Think about { ruby: ruby } Hash in ruby – it’s perfectly valid.

bundlerEnv function requires a few things to work: a Gemfile and Gemfile.lock files of the bundler environment it needs to build and a gemset.nix file.

The easy part is to create Gemfile and Gemfile.lock. Let’s create simplest possible Gemfile containing pws gem:

source 'https://rubygems.org' gem 'pws' 

Let’s generate Gemfile.lock file by running bundle install command.

Now, if we have both Gemfile and Gemfile.lock in one directory, you can generate gemset.nix using Bundix tool. gemset.nix is basically a Gemfile.lock but written in nix language. We need it because we want Nix to know what are dependencies of our package. Note that bundix is not yet finished, thus it is not able to translate less used Gemfile features like path: attribute.

Examplary gemset.nix looks like this:

{   clipboard = {     source = {       remotes = ["https://rubygems.org"];       sha256 = "11r5xi1fhll4qxna2sg83vmnphjzqc4pzwdnmc5qwvdps5jbz7cq";       type = "gem";     };     version = "1.0.6";   };   paint = {     source = {       remotes = ["https://rubygems.org"];       sha256 = "1z1fqyyc2jiv6yabv467h652cxr2lmxl5gqqg7p14y28kdqf0nhj";       type = "gem";     };     version = "1.0.1";   };   pbkdf2-ruby = {     source = {       remotes = ["https://rubygems.org"];       sha256 = "014vb5k8klvh192idqrda2571dxsp7ai2v72hj265zd2awy0zyg1";       type = "gem";     };     version = "0.2.1";   };   pws = {     source = {       remotes = ["https://rubygems.org"];       sha256 = "1brn123mmrw09ji60sa13ylgfjjp7aicz07hm9h0dc3162zlw5wn";       type = "gem";     };     version = "1.0.6";   }; } 

Currently our function is generating a bundler environment and if we would release it this way, pws program would be able to run. However current default.nix has two major disadvantages:

  • it doesn’t include runtime dependencies. On Linux, pws needs xsel command to run and if user won’t install this program by himself, pws will be useless
  • as it’s bundler environment, it expose following binaries: pws, bundle and bundler. You probably don’t want to surprise someone with providing bundler binary by installing your package.

Wrapper package

Thus, let’s create a wrapper package which will just use generated bundler environment and provide only pws as a binary.

Our wrapper package can be achieved by the following code:

{ bundlerEnv, ruby, stdenv, makeWrapper }:  stdenv.mkDerivation rec {   name = "pws-1.0.6";    env = bundlerEnv {     name = "pws-1.0.6-gems";      ruby = ruby;      gemfile  = ./Gemfile;     lockfile = ./Gemfile.lock;     gemset   = ./gemset.nix;   };    buildInputs = [ makeWrapper ];    phases = ["installPhase"];    installPhase = ''     mkdir -p $out/bin     makeWrapper ${env}/bin/pws $out/bin/pws   ''; } 

mkDerivation from stdenv is a function which is usually used to create packages (the simple ones) written for example in C or Bash.

As you can see env field is pretty much our bundler environment which we’ve used before but now it’s only part of our package.

phases field is an array which keeps list of phases needed to build this package. It’s mostly convention-driven and in the end we can use arbitrary names there. We declare that the process of building our package consists of only one phase (named installPhase).

installPhase itself is just a simple 2-line bash script. It uses a makeWrapper function provided by NixOS just for situation like this – it generates a simple script which does only one thing – call an exec given as an argument.

If we check the source of such generated file, it looks like this:

#!/nix/store/1qg54rgrk0sm04fqjixm64hn94kxhvzk-bash-4.3-p42/bin/bash -e exec /nix/store/slxvwr8zgl2ajzjhb8692kp7mch978v7-pws-1.0.6-gems/bin/pws "${extraFlagsArray[@]}" "$@" 

Runtime dependency

There’s only one issue left with our current package which I’ve mentioned before. pws to run properly needs xsel (a clipboard utility). However it’s only run-time dependency. Run-time dependency (as opposed to build-time dependency) means that we can successfully build a package without this dependency, but our program will misbehave when this run-time dependency is not present.

That’s why we want to modify our package a little bit:

{ bundlerEnv, ruby, stdenv, makeWrapper, xsel }:  stdenv.mkDerivation rec {   name = "pws-1.0.6";    env = bundlerEnv {     name = "pws-1.0.6-gems";      ruby = ruby;      gemfile  = ./Gemfile;     lockfile = ./Gemfile.lock;     gemset   = ./gemset.nix;   };    buildInputs = [ makeWrapper ];    phases = ["installPhase"];    installPhase = ''     mkdir -p $out/bin     makeWrapper ${env}/bin/pws $out/bin/pws /       --set PATH '"${xsel}/bin/:$PATH"'   ''; } 

We’ve added xsel as dependency. We’ve also modified the installPhase script (makeWrapper call, to be specific) to prepend location of xsel (which is something like /nix/store/...-xsel/bin) to our PATH environment variable.

Now our package is done. You could follow Chapter 10. Submitting changes of nixpkgs manual to release your package to the public. Based on my experience, it’s very simple process. If you’re a ruby developer I hope this guide got you closer to the nix ecosystem and let me know if there are topics you would like to get coveraged. If you are interested in deploying your ruby applications in a declarative way, sign up to our newsletter to get info on that topic.

How I hunted the most odd ruby bug

Every now and then there comes a bug in programmer’s life that is different than anything else you’ve encountered so far. Last time for me it was 3 days of debugging to find out that MySQL was returning incorrect results. We didn’t have to do much to fix it. We removed an index and created it again from scratch. There, problem gone. But that was a few years ago.

Last week I was hunting an entirely different beast. But before we dive into details, let me tell you a bit about the business story behind it.

Business background

We are working on a ticketing platform which sells tickets for big events, festivals but also for smaller gigs. The nature of this industry is that from time to time there are spikes of sales when an organizer opens sales for hot tickets. Early birds discounts and other promotions. You’ve probably attended some conferences and concerts. You know how it works.

In our application there are tons of things that happen in the background. Especially after the sale. We want the sale to be extremely quick so that we can handle the spikes nicely without hiccups. So when the purchase is finalized we have 15 background jobs or maybe even more. Some are responsible for generating the PDFs of those tickets, some are responsible for delivering emails with receipts. Other communicate with 3rd party APIs responsible for delivering additional services for the buyers.

Let me show you how the sales spikes look like.

How I hunted the most odd ruby bug

And their effect on the number of queued jobs that we had:

How I hunted the most odd ruby bug

As you can notice we were not even processing our jobs fast enough to handle next spikes (other organizers, other events are still selling at the same time on our platform). During normal hours we were digging ourselves out from the 4K of jobs (sometimes we even had spikes to 40K jobs) but very slowly. So obviously we were not happy with our performance.

But I know our codebase and I know that many of background jobs do almost nothing. They do 1 or 2 SQL query and decide that given sale is not related to them. When the sales is related they need to do something which does take long times but often they quit very quickly. Also, we had 6 machines with 4 resque workers running on them so technically we had 24 processes for handling the load of jobs. And yet everything appeared to be… slow. Very slow.

I had this gut feeling that there is a hidden problem and we are missing something. We started investigating.

Confirming the problem

We logged in to our utility machines and simply checked with htop what are the processes doing, what’s the load etc.

How I hunted the most odd ruby bug

The load was often between 0.2 - 0.4. For 2 CPU machines with 4 resque jobs running on each of them. In the moment when they should be processing, like crazy, thousands of queued jobs. So that sounded ridiculously low.

We started watching resque logs which show when a job is taken and when it’s done. I was able to see jobs being processed normally one by one and then a moment later things stopped. A job was done but next job was not taken and started. But if you looked at the tree of processes on the server you could clearly see that a master resque process was still waiting for it’s child. Even though the child already reported the job as done.

How I hunted the most odd ruby bug

31480 is a child processing a job, 30383 is the master process waiting for the child to finish. 31481 is a ruby thread in the master ruby process just trolling you in htop output. Use Shift+h to disable viewing threads in htop in the future. It took me long time to understand what’s going on because both parent and child processes had 2 additional threads for honeybadger and new relic.

So let’s reiterate the confirmed problem.

  • Child process finished the job according to resque logs.
  • Master process is still waiting for the child process to die

You must know that resque is forking child processes for every job and waiting for them to die before starting with next job in the queue.

Here is related part of code from resque

def work(interval = 5.0, &block)   interval = Float(interval)   $0 = "resque: Starting"   startup    loop do     break if shutdown?      if not paused? and job = reserve       log "got: #{job.inspect}"       job.worker = self       working_on job        procline "Processing #{job.queue} since #{Time.now.to_i} [#{job.payload_class_name}]"       if @child = fork(job)         srand # Reseeding         procline "Forked #{@child} at #{Time.now.to_i}"         begin           Process.waitpid(@child)         rescue SystemCallError           nil         end         job.fail(DirtyExit.new($?.to_s)) if $?.signaled?       else         # ... 

Investigating the issue

This was very hard. I started by adding more log statements on production to the resque gem to see what’s going on. I found nothing really interesting. All I was able to see was that the child process did the job, started firing at_exit hooks but was still hanging there.

I used strace to see what’s going on when then process was doing nothing and later and it was not very helpful.

futex(0x7fb955978744, FUTEX_WAIT_PRIVATE, 13, NULL) = 0 futex(0x7fb9559787c0, FUTEX_WAKE_PRIVATE, 1) = 0 rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fb954a7f0f0}, {0x7fb954f10630, [], SA_RESTORER|SA_SIGINFO, 0x7fb954a7f0f0}, 8) = 0 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER|SA_SIGINFO, 0x7fb954a7f0f0}, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fb954a7f0f0}, 8) = 0 close(47)                               = 0 close(39)                               = 0 close(43)                               = 0 close(45)                               = 0 close(44)                               = 0 close(48)                               = 0 close(52)                               = 0 close(53)                               = 0 close(46)                               = 0 close(50)                               = 0 close(49)                               = 0 close(51)                               = 0 close(11)                               = 0 close(16)                               = 0 close(54)                               = 0 close(18)                               = 0 open("/dev/null", O_RDWR|O_CLOEXEC)     = 11 dup2(11, 10)                            = 10 close(11)                               = 0 write(10, "/1/0/0/0/1", 5)              = 5 shutdown(10, SHUT_RDWR)                 = -1 ENOTSOCK (Socket operation on non-socket) close(10)                               = 0 close(9)                                = 0 close(8)                                = 0 close(17)                               = 0 close(14)                               = 0 close(15)                               = 0 close(12)                               = 0 close(13)                               = 0 open("/dev/null", O_RDWR|O_CLOEXEC)     = 8 dup2(8, 55)                             = 55 close(8)                                = 0 write(55, "/1/0/0/0/1", 5)              = 5 shutdown(55, SHUT_RDWR)                 = -1 ENOTSOCK (Socket operation on non-socket) close(55)                               = 0 close(7)                                = 0 write(4, "!", 1)                        = 1 munmap(0x7fb955573000, 1052672)         = 0 exit_group(0)                           = ? 

I also used gdb to attach to the process and with thread apply all backtrace I was able to see:

Thread 3 (Thread 0x7fb955685700 (LWP 11705)): #0  0x00007fb954b2baa3 in poll () from /lib64/libc.so.6 #1  0x00007fb954f8625a in thread_timer () from /usr/lib64/libruby21.so.2.1 #2  0x00007fb954837e46 in start_thread () from /lib64/libpthread.so.0 #3  0x00007fb954b3476d in clone () from /lib64/libc.so.6  Thread 2 (Thread 0x7fb947df2700 (LWP 11708)): #0  0x00007fb95483c8ee in pthread_cond_timedwait () from /lib64/libpthread.so.0 #1  0x00007fb954f805ce in native_cond_timedwait () from /usr/lib64/libruby21.so.2.1 #2  0x00007fb954f81042 in native_sleep () from /usr/lib64/libruby21.so.2.1 #3  0x00007fb954f83972 in sleep_timeval () from /usr/lib64/libruby21.so.2.1 #4  0x00007fb954ed0765 in rb_f_sleep () from /usr/lib64/libruby21.so.2.1 #5  0x00007fb954f63fba in vm_call_cfunc_with_frame () from /usr/lib64/libruby21.so.2.1 #6  0x00007fb954f71b1e in vm_call_general () from /usr/lib64/libruby21.so.2.1 #7  0x00007fb954f67fc1 in vm_exec_core () from /usr/lib64/libruby21.so.2.1 #8  0x00007fb954f6c8c9 in vm_exec () from /usr/lib64/libruby21.so.2.1 #9  0x00007fb954f74ee7 in loop_i () from /usr/lib64/libruby21.so.2.1 #10 0x00007fb954e60827 in rb_rescue2 () from /usr/lib64/libruby21.so.2.1 #11 0x00007fb954f62a0e in rb_f_loop () from /usr/lib64/libruby21.so.2.1 #12 0x00007fb954f63fba in vm_call_cfunc_with_frame () from /usr/lib64/libruby21.so.2.1 #13 0x00007fb954f71b1e in vm_call_general () from /usr/lib64/libruby21.so.2.1 #14 0x00007fb954f686e4 in vm_exec_core () from /usr/lib64/libruby21.so.2.1 #15 0x00007fb954f6c8c9 in vm_exec () from /usr/lib64/libruby21.so.2.1 #16 0x00007fb954f6efd2 in invoke_block_from_c () from /usr/lib64/libruby21.so.2.1 #17 0x00007fb954f6f7f6 in vm_invoke_proc () from /usr/lib64/libruby21.so.2.1 #18 0x00007fb954f6f9aa in rb_vm_invoke_proc () from /usr/lib64/libruby21.so.2.1 #19 0x00007fb954f82d42 in thread_start_func_2 () from /usr/lib64/libruby21.so.2.1 #20 0x00007fb954f82f2e in thread_start_func_1 () from /usr/lib64/libruby21.so.2.1 #21 0x00007fb954837e46 in start_thread () from /lib64/libpthread.so.0 #22 0x00007fb954b3476d in clone () from /lib64/libc.so.6  Thread 1 (Thread 0x7fb955675700 (LWP 11703)): #0  0x00007fb95483c56c in pthread_cond_wait () from /lib64/libpthread.so.0 #1  0x00007fb954f7f0e9 in native_cond_wait () from /usr/lib64/libruby21.so.2.1 #2  0x00007fb954f810d2 in native_sleep () from /usr/lib64/libruby21.so.2.1 #3  0x00007fb954f852f7 in rb_thread_terminate_all () from /usr/lib64/libruby21.so.2.1 #4  0x00007fb954e6203d in ruby_cleanup () from /usr/lib64/libruby21.so.2.1 #5  0x00007fb954e62275 in ruby_run_node () from /usr/lib64/libruby21.so.2.1 #6  0x00007fb955487a2b in main () 

Based on main in Thread 1 I assumed this was the main thread. And apparently there were 2 more threads. rb_thread_terminate_all looked interesting to me but I could not google anything useful about it this internal Ruby method.

I wasn’t sure what those two threads were for, which gems would use them. I was able to figure it out later with one trick:

# config/initializers/resque.rb Resque.after_fork do   at_exit do     Thread.list.each{|t| Resque.logger.info t.backtrace.join("/n") }   end end 

which gave me:

config/initializers/resque.rb:19:in `backtrace' config/initializers/resque.rb:19:in `block (3 levels) in <top (required)>' config/initializers/resque.rb:19:in `each' config/initializers/resque.rb:19:in `block (2 levels) in <top (required)>'  new_relic/agent/event_loop.rb:118:in `select' new_relic/agent/event_loop.rb:118:in `wait_to_run' new_relic/agent/event_loop.rb:104:in `run_once' new_relic/agent/event_loop.rb:99:in `run' new_relic/agent/agent.rb:630:in `create_and_run_event_loop' new_relic/agent/agent.rb:690:in `block (2 levels) in deferred_work!' new_relic/agent.rb:453:in `disable_all_tracing' new_relic/agent/agent.rb:687:in `block in deferred_work!' new_relic/agent/agent.rb:667:in `catch_errors' new_relic/agent/agent.rb:686:in `deferred_work!' new_relic/agent/agent.rb:713:in `block in start_worker_thread' new_relic/agent/threading/agent_thread.rb:14:in `call' new_relic/agent/threading/agent_thread.rb:14:in `block in create'  honeybadger/agent.rb:302:in `sleep' honeybadger/agent.rb:302:in `work' honeybadger/agent.rb:283:in `block in run' honeybadger/agent.rb:283:in `loop' honeybadger/agent.rb:283:in `run' honeybadger/agent.rb:175:in `block (2 levels) in start' 

So I had two potential candidates to investigate. New Relic and Honeybadger.

I grepped their codebase for Thread, sleep, at_exit, Thread.join and investigated the code around it. I was trying to find out if there could be a race condition, a situation in which the main ruby thread from at_exit callback would call Thread.join to wait for a honeybadger or new relic thread responsible for either collecting or sending data to them. But I could not find anything like it.

I configured New Relic and Honeybadger gems to work in DEBUG mode on one machine. I was watching the logs and trying to figure out something odd at the moment when I saw resque doing nothing after the job was done. It was not easy task because the occurences of those do-nothing` periods were quite random. Sometimes I was able to spot it after 3 jobs were done. Other times it had to process 80 jobs for the problem to appear. Also it was not easy to spot any kind of similarity. But at some point I noticed one pattern.

The previous log listing that I show you did not contain honeybadger when the problem occured. It looked like.

config/initializers/resque.rb:19:in `backtrace' config/initializers/resque.rb:19:in `block (3 levels) in <top (required)>' config/initializers/resque.rb:19:in `each' config/initializers/resque.rb:19:in `block (2 levels) in <top (required)>'  new_relic/agent/event_loop.rb:118:in `select' new_relic/agent/event_loop.rb:118:in `wait_to_run' new_relic/agent/event_loop.rb:104:in `run_once' new_relic/agent/event_loop.rb:99:in `run' new_relic/agent/agent.rb:630:in `create_and_run_event_loop' new_relic/agent/agent.rb:690:in `block (2 levels) in deferred_work!' new_relic/agent.rb:453:in `disable_all_tracing' new_relic/agent/agent.rb:687:in `block in deferred_work!' new_relic/agent/agent.rb:667:in `catch_errors' new_relic/agent/agent.rb:686:in `deferred_work!' new_relic/agent/agent.rb:713:in `block in start_worker_thread' new_relic/agent/threading/agent_thread.rb:14:in `call' new_relic/agent/threading/agent_thread.rb:14:in `block in create' 

So I decided to focus more of my efforts on looking into honeybadger logs and codebase. I still could not find out why would anything threading related in Honeybadger cause an issue. So I paid more attention to the logs. I even edited the gem on production to produce much more logs. And then I noticed something:

This is how the logs looked like when everything went fine (line numbers might not reflect those in your codebase as I added more debugging statements). One line highlighted for your convinience.

15:27:27.137465 #11085] starting agent level=0 pid=11085 at="lib/honeybadger/agent.rb:172:in `block in start'" 15:27:27.138673 #11085] agent discarding trace duration=279.26 feature=traces id=c2ee2e5d-bfbf-4c9b-84dd-5a959dc125ac level=0 pid=11085 at="lib/honeybadger/agent.rb:236:in `trace'" 15:27:27.140279 #11085] stopping agent level=0 pid=11085 at="lib/honeybadger/agent.rb:182:in `stop'" 15:27:27.140410 #11085] cleared pid level=0 pid=11085 at="lib/honeybadger/agent.rb:187:in `stop'" 15:27:27.140550 #11085] killed thread level=0 pid=11085 at="lib/honeybadger/agent.rb:191:in `stop'"  15:27:27.140727 #11085] stopping agent level=0 pid=11085 at="lib/honeybadger/agent.rb:296:in `ensure in run'"  15:27:27.140874 #11085] flushed traces level=0 pid=11085 at="lib/honeybadger/agent.rb:195:in `stop'" 15:27:27.141150 #11085] flushed metrics level=0 pid=11085 at="lib/honeybadger/agent.rb:197:in `stop'" 15:27:27.141271 #11085] shuutting down worker notices level=0 pid=11085 at="lib/honeybadger/agent.rb:201:in `block in stop'" 15:27:27.141436 #11085] shut down worker notices level=0 pid=11085 at="lib/honeybadger/agent.rb:203:in `block in stop'" 15:27:27.141553 #11085] shuutting down worker traces level=0 pid=11085 at="lib/honeybadger/agent.rb:201:in `block in stop'" 15:27:27.141683 #11085] shut down worker traces level=0 pid=11085 at="lib/honeybadger/agent.rb:203:in `block in stop'" 15:27:27.141810 #11085] shuutting down worker metrics level=0 pid=11085 at="lib/honeybadger/agent.rb:201:in `block in stop'" 15:27:27.141946 #11085] shut down worker metrics level=0 pid=11085 at="lib/honeybadger/agent.rb:203:in `block in stop'" 

And compare it to:

15:27:27.806954 #11091] starting agent level=0 pid=11091 at="lib/honeybadger/agent.rb:172:in `block in start'" 15:27:27.807586 #11091] agent discarding trace duration=222.84 feature=traces id=a55c3984-4835-468a-a89e-d7ddf4a0b499 level=0 pid=11091 at="lib/honeybadger/agent.rb:236:in `trace'" 15:27:27.810092 #11091] stopping agent level=0 pid=11091 at="lib/honeybadger/agent.rb:182:in `stop'" 15:27:27.810326 #11091] cleared pid level=0 pid=11091 at="lib/honeybadger/agent.rb:187:in `stop'" 15:27:27.810553 #11091] killed thread level=0 pid=11091 at="lib/honeybadger/agent.rb:191:in `stop'" 15:27:27.810813 #11091] flushed traces level=0 pid=11091 at="lib/honeybadger/agent.rb:195:in `stop'" 15:27:27.811129 #11091] flushed metrics level=0 pid=11091 at="lib/honeybadger/agent.rb:197:in `stop'" 15:27:27.811368 #11091] shuutting down worker notices level=0 pid=11091 at="lib/honeybadger/agent.rb:201:in `block in stop'" 15:27:27.811576 #11091] shut down worker notices level=0 pid=11091 at="lib/honeybadger/agent.rb:203:in `block in stop'" 15:27:27.811761 #11091] shuutting down worker traces level=0 pid=11091 at="lib/honeybadger/agent.rb:201:in `block in stop'" 15:27:27.812008 #11091] shut down worker traces level=0 pid=11091 at="lib/honeybadger/agent.rb:203:in `block in stop'" 15:27:27.812246 #11091] shuutting down worker metrics level=0 pid=11091 at="lib/honeybadger/agent.rb:201:in `block in stop'" 15:27:27.812525 #11091] shut down worker metrics level=0 pid=11091 at="lib/honeybadger/agent.rb:203:in `block in stop'"  15:27:37.813214 #11091] stopping agent level=0 pid=11091 at="lib/honeybadger/agent.rb:296:in `ensure in run'" 

The second time stopping agent appears in the log exactly after 10 seconds.

Digging into Honeybadger’s code

And guess what I’ve remembered from reading Honeybadger codebase.

def initialize(config)   @config = config   @delay = config.debug? ? 10 : 60 

And here is where that delay is being used inside work method:

def run   loop { work } # <<-- HERE rescue Exception => e   error {     msg = "error in agent thread (shutting down) class=%s message=%s/n/t%s"     sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("/n/t"))   } ensure   d { sprintf('stopping agent') } end  def work   flush_metrics if metrics.flush?   flush_traces if traces.flush? rescue StandardError => e   error {     msg = "error in agent thread class=%s message=%s/n/t%s"     sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("/n/t"))   } ensure   sleep(delay) # <<-- HERE end 

And the work methods is being called from inside of run method which is what Honeybadger is scheduling in a separate thread.

def start   mutex.synchronize do     return false unless pid     return true if thread && thread.alive?      debug { 'starting agent' }      @pid = Process.pid     @thread = Thread.new { run } # <<-- HERE   end    true end 

Ok, so we have a thread running which sometimes sleeps. But how would that affect us? Why would it matter. After all when Ruby main thread finishes it does not care about other threads that are running. Also Honeybadger is not doing Thread.join at any point to wait for that thread. If anything, it’s doing the opposite. Check it out. It’s killing the Thread.

at_exit do   stop if config[:'send_data_at_exit'] end  
def stop(force = false)   debug { 'stopping agent' }    mutex.synchronize do     @pid = nil   end    # Kill the collector   Thread.kill(thread) if thread # <<-- HERE    unless force     flush_traces     flush_metrics   end    workers.each_pair do |key, worker|     worker.send(force ? :shutdown! : :shutdown)   end    true end 

Epiphany

At some point I realized a fact that was somehow missing my attention. There are two cases what can happen inside work method.

Imagine that there is an exception when we sleep

def work   flush_metrics if metrics.flush?   flush_traces if traces.flush? rescue StandardError => e   error {     msg = "error in agent thread class=%s message=%s/n/t%s"     sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("/n/t"))   } ensure   sleep(delay) # <- Exception here end 

What would happen is that the we would quit the ensure block and the exception would bubble up. That’s an easy and harmless scenario.

But what happens when the exception happens inside one of the flush methods?

def work   flush_metrics if metrics.flush? # <- Exception here   flush_traces if traces.flush? rescue StandardError => e   error {     msg = "error in agent thread class=%s message=%s/n/t%s"     sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("/n/t"))   } ensure   sleep(delay) end 

We land inside the ensure block and we sleep. It was an interesting revelation for me. It seemed important and similar to what might randomly happen in real life. A race condition in action.

And thread being killed is like an exception in your ruby code.

But I still could not find any Ruby code that would actually wait for that Thread. But everything pointed out that something does indeed wait for it.

In which we bring Ruby as a witness

So I decided to create a very small piece of Ruby code similar to what Honeybadger does. Just to convince myself that Ruby is not guilty here. And that I am clearly missing something which causes this whole situation to occur.

Here is what I wrote.

require 'securerandom' class MyThread < ::Thread; end  def delay   15 end  def run   loop { work } rescue Exception => e   puts "#{Time.now} Exception" ensure   puts "#{Time.now} stopping agent" end   def work   puts "#{Time.now} start work"   10_000_000.times { SecureRandom.hex }   puts "finished work" rescue StandardError => e   puts "#{Time.now} Error" ensure   puts "#{Time.now} start sleep"   sleep(delay)   puts "#{Time.now} finished sleep" end  t = MyThread.new{ run }  at_exit do   puts "#{Time.now} killing thread"   Thread.kill(t)   puts "#{Time.now} killed thread" end  sleep(10) exit 

This 10_000_000.times { SecureRandom.hex } is just there so that it takes more than 10 seconds which we wait in main thread before trying to exit the whole program.

And here is the output:

2016-04-15 11:07:09 +0200 start work 2016-04-15 11:07:19 +0200 killing thread 2016-04-15 11:07:19 +0200 killed thread 2016-04-15 11:07:19 +0200 stopping agent 

Well, nothing interesting here. I was obviously right. Ruby does not wait for other threads to finish. So the program stopped after 10 seconds. What else would you expect? But you never know, so let’s execute it again.

2016-04-15 11:07:26 +0200 start work 2016-04-15 11:07:36 +0200 killing thread 2016-04-15 11:07:36 +0200 killed thread 2016-04-15 11:07:36 +0200 start sleep 2016-04-15 11:07:51 +0200 finished sleep 2016-04-15 11:07:51 +0200 stopping agent 

And here we are with our problem reproduced. This time Ruby waited for the thread t to finish. So it waited for the 15s sleep delay inside the ensure block. I did not see that coming.

I was able to randomly cause this behavior in Ruby 2.1, 2.2 and 2.3 as well. The behavior of this program was non-deterministic for me. Sometimes it waits and sometimes it does not.

Hotfixing

Now that I was sure that I have not missed any code in Honeybadger and for, so far unknown, reason Ruby just waits for its thread to finish; it was time to hot-fix the problem.

I knew I could just use send_data_at_exit config to omit the problem. But I want to have my exceptions from background jobs logged. And since if there is an exception the jobs immediately finishes and quits I was afraid we would miss them. So I wanted to dig into the root problem of it and find a fix or workaround that would let us still send the last data when the program quits.

Here is my very simple hotfix. It skips the sleep phase if the thread is aborting which is the state it has after being killed with Thread.kill.

if Honeybadger::VERSION != "2.1.0"   raise "You've upgraded the gem. Check if the hotfix still applies   an in identical way! They might have changed #work method body." end Honeybadger::Agent.class_eval do   def work     flush_metrics if metrics.flush?     flush_traces if traces.flush?   rescue StandardError => e     error {       msg = "error in agent thread class=%s message=%s/n/t%s"       sprintf(msg, e.class, e.message.dump, Array(e.backtrace).join("/n/t"))     }   ensure     sleep(delay) unless Thread.current.status == "aborting" # 'unless' is the hotfix   end end 

I notified Honeybadger, Resque and Ruby team about this problem and I hope they will come up with a good coldfix.

Results

The spectacular effect was visible immediately in all metrics that we had.

The remaining jobs were processed much much faster after the hotfix deployment.

How I hunted the most odd ruby bug

The CPU usage became much higher on the utility machines for the time they were processing the spike.

How I hunted the most odd ruby bug

I suspect that with that improvement we will be even able to decommission some of them.

Learn more

Did you like this article? You will find our Rails books interesting as well . We especially recommend Responsible Rails which contains more stories like that.

How I hunted the most odd ruby bug

You can also enjoy:

How I hunted the most odd ruby bug How I hunted the most odd ruby bug How I hunted the most odd ruby bug How I hunted the most odd ruby bug How I hunted the most odd ruby bug

Resources

Custom type-casting with ActiveRecord, Virtus and dry-types

In Drop this before validation and just use a setter method I showed you how to avoid a common pattern of using before_validation to fix the data. Instead I proposed you just overwrite the setter, call your custom logic there and use super. I also showed you what you can do if you can’t easily call super .

But sometimes to properly transform the incoming data or attributes you just need to improve the type-casting logic. And that’s it. So let’s see how you can add your custom typecasting rules to a project.

And let’s continue with the simple example of stripped string.

Active Record 4.2+

class StrippedString < ActiveRecord::Type::String   def cast_value(value)     value.to_s.strip   end end 
class Post < ActiveRecord::Base   attribute :title, StrippedString.new end 
p = Post.new(title: " Use Rails ") p.title # => "Use Rails" 

Virtus

class StrippedString < Virtus::Attribute   def coerce(value)     value.to_s.strip   end end 
class Address   include Virtus.model   include ActiveModel::Validations    attribute :country_code, String   attribute :street,       StrippedString   attribute :zip_code,     StrippedString   attribute :city,         StrippedString   attribute :full_name,    StrippedString    validates :country_code,     :street,     :zip_code,     :city,     :full_name,     presence: true end 
a = Address.new(city: " Wrocław ") a.city # => "Wrocław" 

dry-types 0.6

module Types   include Dry::Types.module   StrippedString = String.constructor(->(val){ String(val).strip }) end 
class Post < Dry::Types::Struct   attribute :title, Types::StrippedString end 
p = Post.new(title: " Use dry ") p.title # => "Use dry" 

Conclusion

If you want to improve type casting for you Active Record class or if you need it for a different layer (e.g. a Form Object or Command Object) in both cases you are covered.

Historically, we have been using Virtus for that non-persistable layers. But with the recent release of dry-types (part of dry-rb) we started also investigating this angle as it looks very promising. I am very happy with the improvements added between 0.5 and 0.6 release. Definitelly a step in a right direction.

Summary

That’s it. That’s the entire lesson. If you want more free lessons how to improve your Rails codebase, subscribe to our mailing list below. We will regularly send you valuable tips and tricks. 3200 developers already trusted us.

More

Did you like this article? You will definitely find our Rails books interesting as well .

Custom type-casting with ActiveRecord, Virtus and dry-typesCustom type-casting with ActiveRecord, Virtus and dry-types