Monthly Archives: January 2019

How to quickly add graphs and charts to Rails app

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

When how to visualize data in your Rails app there are certain factors that you need to consider.

  • Static graphs which generate images are out of question. They are not any simpler to use, install or maintain and are less usable. The ability to toggle and highlight is just a necessity in XXI century. Thus our options are limited to charts generated with JavaScript.
  • You are probably working for a startup with monetary constraints so using libraries which cost $200 is something might want to avoid.
  • You would prefer something looking good out of the box, which can also be easily styled by designers to follow the look&feel of the whole app.
  • You would like something maintained so it continues working in the future with newer browser versions without spending much time in upgrades.

I am gonna propose you use Google Charts. Interactive and maintained by Google.

How to quickly add graphs and charts to Rails app

Model + SQL

class Order < ApplicationRecord   def self.totals_by_year_month     find_by_sql(<<-SQL       SELECT         date_trunc('month', created_at) AS year_month,         sum(amount) as amount       FROM orders       GROUP BY year_month       ORDER BY year_month, amount       SQL     ).map do |row|       [         row['year_month'].strftime("%B %Y"),         row.amount.to_f,       ]     end   end end  
  • date_trunc is a PostgreSQL function which truncates the date to certain precision.

This methods returns the data in format such as:

[   ["July 2017", 346.0],   ["July 2016", 50.0], ] 

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

Obviously it is up to you what data and how you want to visualize πŸ™‚ This is just a simple example.

Get your data in controller

class OrdersController < ApplicationController   def index     @data = Order.totals_by_year_month   end end 

Pass to view and javascript

<div id="chart" style="width: auto; height: 600px;"></div>  <script type="text/javascript" src="https://www.gstatic.com/charts/loader.js"></script> <%= javascript_tag do -%>   google.charts.load('current', {'packages':['bar']});   google.charts.setOnLoadCallback(drawChart);    function drawChart() {     var data = JSON.parse('<%= @data.to_json.html_safe -%>');     data = [['Year/Month', 'Amount']].concat(data);     data = google.visualization.arrayToDataTable(data);     var options = {       chart: {         title: 'Sales by year',       }     };      var chart = new google.charts.Bar(document.getElementById('chart'));     chart.draw(data, google.charts.Bar.convertOptions(options));   } <% end -%> 

And that’s it. If your needs are simple, if you don’t need chart which dynamically changes values, if you just want draw a diagram, that’s enough.

There are certain refactorings that you may want to apply once your needs get more sophisticated if you want to treat JavaScript and frontend code as first class citizen in your Rails app.

  • Expose the data via JSON API and obtain it using AJAX
  • Dynamically translate the column names and chart title
  • Move the JavaScript to a separate file and trigger the integration based on certain HTML tags being present on the site
  • Asynchronously load the required JavaScript from google: https://www.gstatic.com/charts/loader.js

You can read more about creating bar charts using Google Charts and check out tons of available configuration options

How to quickly add graphs and charts to Rails app

nil?, empty?, blank? in Ruby on Rails – what’s the difference actually?

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

There are plenty of options available. Let’s evaluate their usefulness and potential problems that they bring to the table.

nil?

  • Provided by Ruby
  • Can an be used on anything
  • Will return true only for nil
 nil.nil? # => true  false.nil? # => false  0.nil? # => false  "".nil? # => false 

empty?

  • Provided by Ruby
  • Can be used on collections such as Array, Hash, Set etc. Returns true when they have no elements.
[].empty? # => true  {}.empty? # => true  Set.new.empty? # => true 
  • but it is not included in Enumerable. Not every object which iterates and returns values knows if if it has any value to return

fib = Enumerator.new do |y|   a = b = 1   loop do     y << a     a, b = b, a + b   end end  fib.empty? # NoMethodError: undefined method `empty?' for #<Enumerator: 
  • It can also be using on Strings (because you can think of String as a collection of bytes/characters)
"".empty? # => true  " ".empty? # => false 
  • The problem with empty? is that you need to know the class of the object to be sure you won’t get an exception. If you don’t know if an object is an Array or nil then using empty? alone is not safe. You need tedious double protection.
object = rand > 0.5 ? nil : array object.empty? # can raise an exception  if !object.nil? && !object.empty? # doh...   # do something end 

This is where Rails comes with ActiveSupport extensions and defines blank? Let’s see how.

blank?

  • Provided by Rails
  • nil and false are obviously blank.
class NilClass   def blank?     true   end end  class FalseClass   def blank?     true   end end 
  • true obviously is not
class TrueClass   #   true.blank? # => false   def blank?     false   end end 
  • Array and Hash are blank? when they are empty? This is implemented using alias_method. You might wonder what about Set. This will be explained in a moment.
class Array   #   [].blank?      # => true   #   [1,2,3].blank? # => false   alias_method :blank?, :empty? end  class Hash   #   {}.blank?                # => true   #   { key: 'value' }.blank?  # => false   alias_method :blank?, :empty? end 
  • String#blank? behavior was changed compared to what ruby does with String#empty? to account for whitespaces
class String   BLANK_RE = //A[[:space:]]*/z/    # A string is blank if it's empty or contains whitespaces only:   #   #   ''.blank?       # => true   #   '   '.blank?    # => true   #   "/t/n/r".blank? # => true   #   ' blah '.blank? # => false   #   # Unicode whitespace is supported:   #   #   "/u00a0".blank? # => true   #   def blank?     # The regexp that matches blank strings is expensive. For the case of empty     # strings we can speed up this method (~3.5x) with an empty? call. The     # penalty for the rest of strings is marginal.     empty? || BLANK_RE.match?(self)   end end 

This is convenient for web applications because you often want to reject or handle differently string which contain only invisible spaces.

  • The logic for every other class is that if it implements empty? then that’s what going to be used. It’s interesting to see that the method and its behavior was documented fully here.
class Object   # An object is blank if it's false, empty, or a whitespace string.   # For example, +false+, '', '   ', +nil+, [], and {} are all blank.   #   # This simplifies   #   #   !address || address.empty?   #   # to   #   #   address.blank?   #   # @return [true, false]   def blank?     respond_to?(:empty?) ? !!empty? : !self   end 

!!empty? – is just a double negation of empty?. This is useful in case empty? returned nil or a string or a number, something different than true or false. That way the returned value is always converted to a boolean value.

!!true # => true  !!false # => false  !!nil  => false  !!0 # => true  !!"abc" # => true 

If you implement your own class and define empty? method it will effortlessly work as well.

class Car   def initialize     @passengers = []   end    def enter(passenger)     @passengers << passenger   end    def empty?     @passengers.empty?   end    def run     # ...   end end  car = Car.new car.blank? # => true  car.enter("robert")  car.blank? # => false 
  • No number or Time is blank. Frankly I don’t know why these methods were implemented separately here and why the implementation from Object is not enough. Perhaps for speed of not checking if they have empty? method which they don’t…
class Numeric #:nodoc:   #   1.blank? # => false   #   0.blank? # => false   def blank?     false   end end  class Time #:nodoc:   #   Time.now.blank? # => false   def blank?     false   end end 

present?

  • Provided by Rails
  • present? is just a negation of blank? and can be used on anything.
class Object   # An object is present if it's not blank.   def present?     !blank?   end end 

presence

Provided by Rails. Sometimes you would like to write a logic such as:

params[:state] || params[:country] || 'US' 

but because the parameters can come from forms, they might be empty (or whitespaced) strings and in such case you could get '' as a result instead of 'US'. This is where presence comes in handy.

Instead of

state   = params[:state]   if params[:state].present? country = params[:country] if params[:country].present? region  = state || country || 'US' 

you can write

params[:state].presence || params[:country].presence || 'US' 

The implementation is very simple:

class Object   def presence     self if present?   end end 

So which one should you use?

If you are working in Rails I recommend using present? and/or blank?. They are available on all objects, work intuitively well (by following the principle of least surprise) and you don’t need to manually check for nil anymore.

Was this helpful?

If you liked this explanation please consider sharing this link on:

  • your company’s Slack or other chat – for the benefit of your coworkers
  • Facebook & Twitter – for fellow developers who you are in touch with

Monitoring Sidekiq queues with middlewares

Sidekiq, similarly to Rack, has a concept of middlewares. A list of wrappers around its processing logic that you can use to include custom behavior.

In chillout we use it to collect and send a number of metrics:

  • how long did it take to process a job

    Obviously it is nice to notice when a certain jobs starts to work much slower than usually.

  • how long did it take between scheduling a job and starting a job

    This is useful to know if your Sidekiq workers are not saturated. Ideally the numbers should be around 1-2ms, which means you are processing everything as it comes and have no delay.

    Depending on what your application does a second or two of a delay might be good enough as well. But if the number is getting higher it means you are having problems and maybe you need more machines, threads or just investigate a temporary issue.

    If it is one job causing you problems, check out your options in Handle sidekiq processing when one job saturates your workers and the rest queue up.

    I used to think that number of unprocessed jobs is a good metric, but I think this is better. I doesn’t matter if you have 1 or 10_000 jobs waiting if you can start all of them very quickly because you have enough workers and the jobs are processed very quickly.

    The delay before processing is a better indicator than queue size. Because you don’t know if you have 1000 jobs which take 10ms each, or 1 job which takes 10 minutes to finish. And all you care about is the effect on other jobs waiting in queues.

  • did it finish successfully or with a failure

    So that one can monitor a failure rate

  • queue and job names

    To have granular metrics per jobs and queues.

The code is very simple and nicely explained in Sidekiq documentation so if you want to build your own logging or monitoring, it’s not hard.

class SidekiqMonitor   def initialize(options)     @client = options.fetch(:client)   end    def call(_worker, job, queue)     started = Time.now.utc     success = false     yield     success = true   ensure     enqueue(queue, job, started, success)   end    def enqueue(queue, job, started, success)     finished = Time.now.utc     @client.enqueue(SidekiqJobMeasurement.new(       job,       queue,       started,       finished,       success     ))   end end  class SidekiqJobMeasurement   attr_reader :retriable, :queue, :started,     :finished, :delay, :duration, :success    def initialize(job, queue, started, finished, success)     @class     = job["class"].to_s     @retriable = job["retry"].to_s     @queue     = queue     @started   = started.utc     @finished  = finished.utc     enqueued_at = job["enqueued_at"]     @delay = 1000.0 * (@started.to_f - enqueued_at)     @duration = 1000.0 * (@finished.to_f - @started.to_f)     @success = success.to_s   end end  Sidekiq.server_middleware.add SidekiqMonitor,   client: client 

Effect (click to enlarge):

Monitoring Sidekiq queues with middlewares

Monitoring Sidekiq queues with middlewares

Testing middlewares is also easy:

  def setup     @client = mock("Client")     Sidekiq::Testing.server_middleware.add SidekiqMonitor,       client: client   end    def teardown     Sidekiq::Testing.server_middleware.clear   end    class EmptyJob     include Sidekiq::Worker     def perform; end   end    def test_enqueues_stats     @client.expects(:enqueue).with do |measurement|       SidekiqJobMeasurement === measurement     end     Sidekiq::Testing.inline! { EmptyJob.perform_async }   end    class ErrorJob     Doh = Class.new(StandardError)     include Sidekiq::Worker     def perform       raise Doh     end   end    def test_enqueues_stats_even_on_failure     @client.expects(:enqueue).with do |measurement|       SidekiqJobMeasurement === measurement &&         measurement.success == "false"     end     Sidekiq::Testing.inline! do       assert_raises(ErrorJob::Doh) do         ErrorJob.perform_async       end     end   end 

Non-coding activities in a software project

Recently in our project, we came up with a list of non-coding activities. Those are the tasks that need be done quite regularly and might be easy to be forgotten.

If we tend to forget them, then there’s a risk that someone else will introduce a process around those activities. Sometimes it may mean new people will be brought so that they “manage” those activities. In my opinion, the more can be done by a developer the better, because we don’t introduce non-technical people to the communication loop.

  • read communication on pivotal (and optionally reply)
  • read communication on slack (and optionally reply)
  • build/monitoring failures
  • review commits from others
  • work on our tickets
  • look at exception notifications
  • create new tickets based on build failures or exceptions
  • challenge the prios in backlog
  • check security updates for gems
  • document higher level concepts (architecture, tracking, technical debt)
  • remove dead code, unused feature toggles

I will probably keep updating this list, as this may serve our team in the longer run. If you feel that we miss something important here, feel free to comment, thanks!

How to keep yourself motivated for blogging?

You got your programming blog, but you don’t blog too much? You don’t feel like doing it? What can you do about it?

Remember why

There is a reason why you started or want to start blogging. Write it down and remember why you are doing something, what you are doing it for. It might be:

  • you want to change a job, you don’t like your team, you don’t enjoy working with, you want to work in a better team, more experienced or in a different technology
  • you get nicely paid but your work is boring, blog is a way to escape it for a little time and do something more interesting, something different
  • you want to get your first programming job
  • you want to speak at programming conferences
  • travel around the world
  • be recognized as an expert, as a valuable member of your programming language community
  • you need clients as a freelancer
  • you want to start a programming agency, have your own team
  • you want to help other developers write better code and stay connected with them

Whatever your goal is, remember about it. There is something you are trying to achieve here. There is a dream you have. Don’t forget the dream.

For me this is Independence. I like to organize my time, my tasks and doing things that I like πŸ˜‰ Without anyone telling me what to work on, how fast, in which way. This drives me. When I remind myself why I am doing what I am doing, why I am blogging, recording, writing a book, coding right now, it gives me a boost. There is more energy and less apathy.

Track your progress and have a reminder

Say you want to deliver 20 blog posts in a year. Around one every second week. Set up a reminder in Trello or Google Calendar or printed calendar, whatever you use. It’s important for you to see a scheduled block of time for writing or a deadline for delivery.

Timebox and reward

You need to find your style of work. Do you like to schedule an hour or two, focus on a single task and be done with it? Is that your style?

Or do you prefer to work in many smaller chunks of work? Here is one technique that I sometimes use:

  • watch a Netflix episode (usually 30-60m)
  • spend 10-30min on blogging
  • watch another Netflix episode
  • optionally spend time on blogging again

It works for me because:

  • usually after a single episode I am already more relaxed and less tired to deliver something
  • I know I won’t be blogging (which requires mental energy) for long. I timebox it.
  • I anticipate a reward, the next episode is waiting for me. I gave myself permission for it.

In other days I use a different technique. I schedule a dedicated two-hour block of time and I immerse myself in writing.

Find out what works best for you.

Remember about your other dreams

The number 1 reason I am lazy and I don’t want to do anything, including writing a blog-post? Not enough joy. I sometimes forget to have fun myself during a whole week. It happens sometimes when we chase our goals and dreams but we don’t appreciate what we have, we don’t stop to notice and use what we already achieved. It’s the routine killing us.

You know the drill:

  • Wake up
  • Get ready
  • Commute to work (if you are not lucky to work remotely)
  • Give your best for 8 hours (+ lunch)
  • Come back
  • Take care of house, kids, dishes, everything
  • Have 0 remaining energy for anything.

What’s the recipe? Only one thing works for me. Consciously scheduling time for joy. Remembering what I like and doing it. I grouped my favorite activities into a few categories:

  • physical
    • soccer
    • carting
    • snowboard
    • biking
  • intellectual
    • learning more about programming
    • reading about brain
    • finding out more about sales
    • science-fiction books
  • emotional
    • hearing
  • spiritual
    • vegetarian
    • meditation
    • buddhism
    • open-source
  • visual
    • computer games
    • cinema
  • kinesthetic
    • cooking
    • computer games
  • audio
    • exploring new music on Spotify
  • social
    • board games
    • cinema with friends
    • multiplayer computer games
    • coworking

I noticed that often I don’t balance those activities properly. For example, I like learning more about programming and sales, I always have a few books about these topics around myself. But… sometimes it feels that I am forcing myself to read them.

On the other hand, when I started listening to an SF book, I finished 17 hours of audio material in a few days. I just forgot that I enjoy a different topic and a different medium.

This happens to me more often than I would like to admit. We love doing something and we forget about doing it. We forget to schedule time for joy and remaining sane. We forget we are more complex and enjoy variety. We default to the same way of resting such as:

  • playing computer games
  • reading books
  • binge watching tv shows
  • reading programming news or Facebook

because it usually works. Perhaps because it worked for so many years on us. But at some point, it often loses its magic. Yes, you enjoy the TV show but only when the episode is great, not average. And frankly, because of recommendations, it is easier to find historically good TV-shows and movies and hard to find fresh ones which are up to your taste. Art is strange in that way. You can use IMDB and other platforms to watch top 100 or top 250 best movies and enjoy them a lot. But then anything next is just… not so good πŸ™‚ For me and computer games, it was hard to enjoy anything after The Witcher 3.

So last week I was like… fuck it. I gotta do something different, go somewhere different and break my habits. I scheduled 2-hour ride to a different city, spending half a day in an aqua-park and enjoyed water sliding like a baby. The next day it was ice skating.

I noticed that I often tend to focus too strongly on intellectual ways of having joy, which can be extremely hard after hours of coding, instead of social and/or physical.

It’s good to have goals, it’s good to want to improve, be better at coding, find a better place, support your family. But when we forget about joy, when we only feel guilty about not working, not doing more, we lose a lot. Especially we lose motivation. I noticed that surprisingly when you give yourself more slack, more freedom, you come more energized to your goals and challenges.

Sometimes the real answer is to forget about blogging, do something different that you love and come back later. I know it sounds trivial. That’s why I tried to provide specific examples how I can completely screw it up.

Handle sidekiq processing when one job saturates your workers and the rest queue up

I saw a great question on reddit which I am gonna quote and try to provide a few possible answers.

Ran in to a scenario for a second or 3rd time today and I’m stumped as how to handle it.

We run a ton of stuff as background workers, pretty standard stuff, broken up in to a few priority queues.

Every now and then one of our jobs fails and starts running for a long time – usually for reasons outside of our control – our connection to S3 drops or as it happened today – our API connection to our mail system was timing out.

So jobs that normally run in a second or two are now taking 60 seconds and holding a worker for that time. Enough of those jobs quickly saturate our available workers and no other work gets done. The 60 second timeout hits for those in-process jobs, they get shuffled to the retry queue, a few smaller jobs process through the available workers until the queued jobs pull in enough of the failing jobs to again saturate the available workers.

I’d think this would be a pattern that other systems would have and there would be a semi-obvious solution for it – I’ve come up empty handed. My thought was to separate the workers by queue and balance those on different worker jobs but then that still runs the risk of saturating a specific queue’s workers.

Here are your options:

  • Lower your timeouts

    Keep monitoring averages and percentiles of how long it takes to finish a certain job in your system (using chillout or any other metric collector). This will give you a better insight into how long is normal for this task to take and what timeout you should set.

    Prefer using configurable, lower-level network timeouts provided directly by libraries over Timeout module.

  • Pause a queue.

    Keep the troublesome job on a separate queue. Use Sidekiq Pro. When lots of jobs are failing or taking too long, just pause the queue. Great feature. Saved our ass a few times.

  • Partition your queues into many machines or processes.

    Have machine one work on queues A,B,C,D and machine two work on queues E,F,G,H.

  • Use Circuit Breaker pattern.

    Circuit breaker is used to detect failures and encapsulates logic of preventing a failure to reoccur constantly

  • Keep your queues in two reverse orders

    I am not sure if that’s possible with Sidekiq but it was possible with Resque. Most of our machines were processing jobs in normal priority: A,B,C,D,E,F,G. But there was one machine configured to process them in reverse: G,F,E,D,C,B,A.

    That way if job D started being problematic then A-C was covered by most machines and G-E was covered by the other machine. Because even if jobs in last queue are least important in your system, you generally don’t want them to be starved but rather keep processing them albeit more slowly.

  • Increase number of threads per worker.

    If most of your tasks are IO bound (usually on networking) then you might increase number of threads processing them as your CPU is probably not utilized fully.

Let me know if you have other ways to handle such situation.

How to safely store API keys in Rails apps

Inspired by a question on reddit: Can you store user API keys in the database? I decided to elaborate just a little bit on this topic.

Assuming you want store API keys (or passwords for SSL ceritifcate files) what are your options? What are the pros and cons in each case.

Save directly in codebase

How?

#config/environments/production.rb   config.mailchimp_api_key = "ABCDEF" 

Cons:

  • Won’t work with dynamic keys provided by users of your app
  • Every developer working on your app knows API keys. This can bite you later when that person leaves or is fired. And I doubt you rotate your API keys regularly. That includes every notebook your developers have, which can be stolen (make sure it has encrypted disc) or gained access to.
  • Every 3rd party app has access to this key. That includes all those cloud-based apps for storing your code, rating your code, or CIs running the tests. Even if you never have a leak, you can’t be sure they don’t have a breach in security one day. After all, they are very good target.
  • Wrong server configuration can lead to exposing this file. There has been historical cases where attackers used ../../something/else as file names, parameter names to read certain files on servers. Not that likely in Rails environment, but who knows.
  • In short: when the project code is leaked, your API key is leaked.
  • Least safe

Save in ENV

How:

config.mailchimp_api_key = ENV.fetch('MAILCHIMP_API_KEY') 

Pros:

  • Won’t work with dynamic keys provided by users of your app
  • Relatively easy. On Heroku you can configure production variables in their panel. For development and test environment you can use dotenv which will set environment based on configuration files. You can keep your development config in a repository and share it with your whole team.

Cons:

  • If your ENV leaks due to a security bug, you have a problem.

Save in DB

How

class Group < ApplicationRecord end  Group.create!(name: "...", mailchimp_api_key: "ABCDEF") 

Pros

  • Easy
  • Works with dynamic keys

Cons

  • If you ever send Group as json, via API, or serialize to other place, you might accidentally leak the API key as well. Take caution to avoid it.
  • If your database or database backup leaks, the keys leaks as well. This can especially happen when developers download backups or use them for development.

Save in DB and encrypt (secret in code or in ENV)

How

class Group < ApplicationRecord   attr_encrypted_options.merge!(key: ENV.fetch('ATTR_ENCRYPTED_SECRET'))   attr_encrypted :mailchimp_api_key end  Group.create!(name: "...", mailchimp_api_key: "ABCDEF") 

Pros

  • For the sensitive API key to be leaked, two things needs to happen:
    • DB leak
    • ENV or code leak, which contain the secret you use for encryption
  • If only one of them happens, that’s not enough.
  • The safest approach

Cons

  • A bit more complicated, but not much
  • Your test might be a bit slower when you strongly encrypt/decrypt in most important models, which are used a lot

Use encrypted Rails secrets

How

  • Configure RAILS_MASTER_KEY env variable on your development and production environment
  • Edit config/secrets.yml.enc using bin/rails secrets:edit and commit + push the changes
  • Set config.read_encrypted_secrets = true at least in config/environments/production.rb
  • Use Rails.application.secrets in the application code
  • Read more

Pros

  • Your API keys are encrypted
  • The keys are versioned using your version control system such as GIT

Cons

  • Does not work with dynamic keys

Would you like to continue learning more?

If you enjoyed the article, subscribe to our newsletter so that you are always the first one to get the knowledge that you might find useful in your everyday Rails programmer job.

Content is mostly focused on (but not limited to) Ruby, Rails, Web-development and refactoring Rails applications.

Also, make sure to check out our latest book Domain-Driven Rails. Especially if you work with big, complex Rails apps.

The easiest posts to write for a programming blog

Here are the 4 easiest types of posts that you can write about on your programming blog:

  • before/after
  • show some code, explain
  • explain how you solved a problem
  • opinion on another blog-post

The purpose of this list is not for you to always write easiest posts. But to know that not every post needs to be a bible, guide or very deep dive in. Shorter forms are welcomed nicely in programming communities and does not require tremendous amount of time to read. They have their own, valuable place in your blogging style. Especially at the beginning of your blogging journey when you are not an expert yet. And especially when you want to build a habit and keep writing regularly, but you don’t feel inspired or don’t have the time for anything very long.

I wanted to elaborate a bit more about show some code and explain technique presented by Andrzej in the video.

I want you to remember, it does not need to be your code. Sometimes you don’t feel inspired by the code you wrote recently in your daily work. Don’t force yourself.

Think about your favorite gem, package, library. Go to github (usually) to check its code. Find lib or src or other directory with code. Find an a filename which sounds important or interesting and start reading. Don’t assume you are going to understand the code easily, but try to get an understanding what this class/function/file does and how it fits in the whole library that you already like and use. Spend as long doing it as you want.

Usually we don’t like jumping to unknown code as programmers and reading someone else code for the first time can be challenging. However… most popular open source libraries usually maintain good quality because of the whole community effort. When you browse code of a library you use and like, you already know more or less its top-level API exposed for programmers. So this is not a completely strange code to you.

I guarantee the feeling won’t be the same as diving for the first time into a new, legacy project that your company got πŸ™‚ Although that can be interesting as well. The feeling will be more refreshing. More curiosity instead of worrying. After all, you don’t need to maintain this codebase. You are just passing by, trying to learn something new and interesting that will make you a better programmer. Something worth sharing with your readers.

You will be doing a few things at the same time:

  • becoming a better programmer by reading and learning from good code
  • becoming familiar with internals of interesting code
  • promoting a library you like
  • teaching other developers about programming techniques and bits of code from that library which inspired you

Helping others and helping yourself at the same time. Everybody wins.

Here is how I did it last time.

Using influxdb with ruby

InfluxDB is an open-source time series database, written in Go. It is optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring, application metrics, and real-time analytics.

We use it in chillout for storing business and performance metrics sent by our collector.

InfluxDB storage engine looks very similar to a LSM Tree. It has a write ahead log and a collection of read-only data files which are similar in concept to SSTables in an LSM Tree. TSM files contain sorted, compressed series data.

If you wonder how it works I can provide you a very quick tour based on the The InfluxDB Storage Engine documentation and what I’ve learnt from a Data Structures that Power your DB part in Designing Data Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

  1. First, arriving data is written to a WAL (Write Ahead Log). The WAL is a write-optimized storage format that allows for writes to be durable, but not easily queryable. Writes to the WAL are appended to segments of a fixed size.

    The WAL is organized as a bunch of files that look like _000001.wal. The file numbers are monotonically increasing and referred to as WAL segments. When a segment reaches a certain size, it is closed and a new one is opened.

  2. The database has an in-memory cache of all the data written to WAL. In a case of a crash and restart this cache is recreated from scratch based on the data written to WAL file.

    When a write comes it is written to a WAL file, synced and added to an in-memory index.

  3. From time to time (based on both size and time interval) the cache of latest data is snapshotted to disc (as Time-Structured Merge Tree File).

    The DB also needs to clear the in-memory cache and can clear WAL file.

    The structure of these TSM files looks very similar to an SSTable in LevelDB or other LSM Tree variants.

  4. In the background, these files can be compacted and merged together to form bigger files.

The documentation has a nice historical overview how previous versions of InfluxDB tried to use LevelDB and BoltDB as underlying engines but it was not enough for the most demanding scenarios.

I must admin that I never really understood very deeply how DBs work under the hood and what are the differences between them (from the point of underlying technology and design, not from the point of APIs, query languages, and features).

The book that I mentioned Designing Data Intensive Applications really helped me understand it.

Let’s go back to using InfluxDB in Ruby.

influxdb-ruby gem

For me personally, influxdb-ruby gem seems to just work.

writes

require 'influxdb' influxdb = InfluxDB::Client.new influxdb.write_point(, {   series: 'orders',   values: {     started: 1,     number_of_products: 4,     total_amount: 55.70,     tax: 5.70,   },   tags:   {     country: "USA",     terminal: "KATE-123",   } }) 

The difference between tags and values is that tags are always automatically indexed.

Queries that use field values as filters must scan all values that match the other conditions in the query. As a result, those queries are not performant relative to queries on tags.

reads

However, InfluxQL query language (similar to SQL but not really it) really shines when it comes to returning data grouped by time periods (notice GROUP BY time(1d)), which is great for metrics and visualizing.

raw data using influxdb console

SELECT   sum(completed)/sum(started) AS ratio FROM orders WHERE time >= '2017-07-05T00:00:00Z'  GROUP BY time(1d) 
name: orders time                ratio ----                ----- 1499212800000000000 0.8 1499299200000000000 0.7 1499385600000000000 0.6 

where Time.at(1499212800).utc is 2017-07-05 00:00:00 UTC and Time.at(1499299200).utc is 2017-07-06 00:00:00 UTC.

influxdb-ruby

Using the gem you can easily query for the data using InfluxQL and get these values nicely formatted.

influxdb.query "select sum(completed)/sum(created) as ratio FROM orders WHERE time >= '2017-07-05T00:00:00Z' group by time(1d)"  [{   "name"=>"orders",   "tags"=>nil,   "values"=>[     {"time"=>"2017-07-05T00:00:00Z", "ratio"=>0.8},     {"time"=>"2017-07-06T00:00:00Z", "ratio"=>0.7},     {"time"=>"2017-07-07T00:00:00Z", "ratio"=>0.6}   ] }] 

What for?

For dashboards and graphs, monitoring and alerting. For business metrics:

Using influxdb with ruby Using influxdb with ruby

And performance metrics (monitoring http and sidekiq):

Using influxdb with ruby

Tracking dead code in Rails apps with metrics

When you work in big Rails application sometimes you would like to remove certain lines of code or even whole features. But often, you are not completely sure if they are truly unused. What can you do?

With chillout.io and other monitoring solutions that’s easy. Just introduce a new metric in the place of code you are unsure about.

class SocialSharesController < ApplicationController   def friendster     Chillout::Metric.track('SocialSharesController#friendster')      # normal code   end end 

After you add a graph to your panel, you can easily configure an alert with notifications to Slack, email or whatever you prefer, so that you are pinged if this code is executed.

Tracking dead code in Rails apps with metrics

Wait an appropriate amount of time such as a few days or weeks. Make sure the code was not invoked and talk to your business client, boss, CTO or coworkers to make the final call that the feature should be dropped. Now you have the arguments.

We all know that unused code is burden for our whole team because we keep supporting it, refactoring (yes, sometimes we do renames or upgrades and we spend time on code delivering no value, don’t we?). Even because it keeps appearing in search results or occupying space in our mind.

As Michael Feathers greatly explained

No, to me, code is inventory. It is stuff lying around and it has substantial cost of ownership. It might do us good to consider what we can do to minimize it.

I think that the future belongs to organizations that learn how to strategically delete code. Many companies are getting better at cutting unprofitable features in their products, but the next step is to pull those features out by the root: the code. Carrying costs are larger than we think. There’s competitive advantage for companies that recognize this.

Or as Eric Lee put it:

However, the code itself is not intrinsically valuable except as tool to accomplish some goal. Meanwhile, code has ongoing costs. You have to understand it, you have to maintain it, you have to adapt it to new goals over time. The more code you have, the larger those ongoing costs will be. It’s in our best interest to have as little source code as possible while still being able to accomplish our business goals.

Or as James Hague expressed it:

To a great extent the act of coding is one of organization. Refactoring. Simplifying. Figuring out how to remove extraneous manipulations here and there.