Friday, December 16, 2016

How to safely kill your NTP server

NTP is that service that typically quietly runs in the background on almost every single computer in the world. When a computer boots up for the first time, NTP gets configured and is almost never considered again. Why? Because for most people, it just works. Until the end of this century (Y2K + 100) or the next leap year second, most system administrators will happily not need to think about the wonders that NTP gives us.  There are, of course, always security considerations to make, but again, once those are addressed, NTP again fades to the back of your mind. 

If you are running your servers in Amazon, it's possible that your VPC doesn't permit access to external NTP peers. In this case, you may follow directions to set up your own NTP server. These were the exact directions my sys admin predecessor followed for the VPC I inherited at my current job. 

As it turns out, every one of the servers in our VPC needs external internet access and we configure that to route through a NAT. So, other than the security considerations of allowing traffic over port 123, there was no good reason to not rely on the public NTP peers. Great! So, that meant we could kill our internal NTP and save time and money by not hosting it.

So, how do you safely stop and shutdown your NTP server?

First up, the basics:
  1. Run a loop across all your servers and `grep 'server' /etc/ntp.conf | grep <ip|name>`. I just used a bash for-loop. 
    1. If you have any matches. Use a similar bash loop to replace the conf file and `sudo service ntpd stop; sudo service ntpd stop`. 
    2. Then run `ntpq -p` to confirm connections could be established with upstream peers.
  2. Check your image and the scripts that build them. If necessary, rebuild those.

Now you can feel pretty confident that everything you know about is configured properly. ...but what about what you don't know about?? Here's what I did:

  1. ssh ntp-server
  2. sudo service ntpd stop
  3. screen
  4. sudo nc -vulk -w 1 123
This sets up a dumb netcat UDP listener on NTP port 123. It will log all connections to the console. The `-w 1` is the only way I could contrive prevent the first connection from commandeering the process. It will time out the connection after 1 second; then combined with `-k`, it will keep listening for new connections. The screen is simply for the safety of knowing that this will keep running when I close my laptop, a.k.a. screen daemon. 

Let this run for a day and if you see any traffic in there, you'll know the IP address of the server trying to get at you for NTP info.

Tuesday, July 12, 2016

git log exit code 141?

Something's that been bugging me for a while now is that when I do a `git log` command and quit before scrolling through all the output, my terminal prompt shows my last command (the git log) exited non-zero with status code 141.

I have git configured to use `less -+S` as the pager.

The answer turns out to be really straightforward, but took me a little bit of internet searching. I am capturing the learning here to save my future self the research and possibly anyone else typing the exact same thing into Google that I did ;-)

The explanation:

  • Git pipes the logs into less
  • I quit the less process, which sends a SIGPIPE signal (13) to the underlying git process streaming the logs
  • git catches the interrupt and exits prematurely and per POSIX convention returns 128 + the SIGPIPE status ==> 141 to indicate that it was terminated by signal 13
The background:

Thursday, June 2, 2016

Yes, git, I did mean that

Git comes out of the box with a nice "auto correct" feature, but sometimes it's not correct. And other times, the first suggestion isn't the one you intended, but the second or third.

Here's a short, fun shell function (tested in zsh and bash) that will run your last git command and figure out what you meant to do and then run it for you.

Parity with auto-correct

$ git dif
git: 'dfif' is not a git command. See 'git --help'.

Did you mean this?
> idid !!
<shows diff>

Choose non-default auto-correct suggestion

$ git pu
git: 'pu' is not a git command. See 'git --help'.

Did you mean one of these?

$ idid -2 !!
<runs git push>

Thursday, May 19, 2016

Starting an Open Source Office within your Company

I had the privilege of sharing the stage yesterday with other members of the TODO Group to share our reflections from running Open Source Offices within our respective companies. See the post on the TODO blog.

In my portion of the talk, I focused on the questions and challenges that might face someone attempting to start up an Open Source presence for their company.

Grass Roots?

The first thing you're faced with deciding is if you should start from the bottom up or the top down. Ultimately, you'll want the executive support from the top, but when you're just getting started you can function without it. Instead, find a good first project to be the poster child. Use this to build excitement and draw attention. Within your engineering team, it's very likely that most folks will be interested in Open Source and want to be involved. It's also just as likely that they'll have no idea how to do that. The poster child will serve as an example of not only how to do it, but that it is possible.

Start working on how to fit thinking about Open Source into the Software Development Life Cycle, for both outgoing open source and incoming open source software.

Find someone who will champion OSS for you, be that person, or share the role.

Good Intentions

The percentage of folks who tell you they are interested in sharing their software is far greater than the percentage of folks who will put in the time to share it. Realizing this upfront will save you a lot of disappointment and undelivered work.

Figure out how to embed OSS into the DNA of your new projects. Learn the build systems that your developers use and make it easy for them to fork small, cohesive modules into open source ready projects that can be composed back into their larger, internal work. For example, break out small python libraries into pypi projects.

Straight Outta Compton

Open sourcing a software project successfully as a company is more than just creating amazing code.  There no trivial amount of red tape to traverse from legal, to security, to IP. And after that there is promotion. Imagine if Eazy E had never met Jerry Heller. There might never have been an NWA as we know it or modern day rap. The role of the Open Source Office is to be the guide through all the red tape and promotional process to your developers that Jerry Heller was to NWA.

Who Cares?

Success with OSS is a living thing and it needs to be tended to. It won't be on the roadmap for most teams and it won’t be a behavior for which time is allotted. In other words: it's unlikely that most developers will be gratuitously spending their time on OSS.

The role of the OSS office is to continually drive engagement within the company. Regularly promote successes in the community or within the company. Maybe you just got a great new hire because of an OSS project or maybe one of your engineers was just recognized in the tech news. Maybe a talented employee previously on the verge of leaving renewed their commitment to the company because of engagement with OSS. Share these successes with the whole engineering team or within the leadership group.

Make the time to meet with team managers to see what projects are coming up in their backlogs. Brainstorm how OSS could be a part of that.


Finally, it's critical to keep your community happy. Never release a project without thinking about promotion and adoption, easy install, good README, etc. Keep an eye on the metrics on your projects. Celebrate healthy projects. Respond to GitHub Issues in a timely manner. Encourage pull requests! Coach your developers to be polite, be thoughtful, and to engage with the community. Create mailing lists. Consider adopting a code of conduct. Be a responsible member of the OSS community and remember that as an enterprise / company, you are setting an example and representing more than simply your company.

Final Note

These were the five points that I found most prominent from my time running Open Source at Box. However, the list of other reflections is much longer. Please see the presentation for some of the other takeaways from other members of the TODO Group. The role of Open Source Office is very critical to any company interested in being part of the OSS community, whether as a consumer or producer.

Saturday, April 2, 2016

Building Successful Teams (it's not just your manager's job)

Over the past year and a half I've been managing a small team of software developers. This past week I was reflecting on some of the practices we've developed as a team that I feel have contributed to our success as both a team and as individuals. Notwithstanding the likelihood that these are not new ideas, I wanted to collect my own reflections in a single, memorable-to-me place.

Success of the team is not measured only by the projects we have completed, but also by the steps each of the team members has made in their careers, personally and professionally. I feel that the practices outlined below are designed to achieve both of these aspects of success.

Our Practices

A rough overview of our formula is below. I've tried to keep the descriptions concise, but several of these points do merit their own blog posts.

Sidenote On Applicability

These practices evolved in a team of 3 people in an org of > 100 people in a department of > 300 people in a company of > 1000. There's a completely fair chance that some of these practices won't make sense in small companies (i.e. less than Dunbar's number) or very young companies.

Recurring Events

  • Weekly 1:1's with manager. Time for the manager to listen and the IC to talk. Check in on short term career development goals. Check in on impediments. Check in on any drama with other employees. See some more ideas, Good Tips for Your 1 on 1.
  • 6-weekly 1:1 offsite lunch to discuss career fit and plan. A deeper dive into long term career goals. A time to build rapport. Again, the IC should do most of the talking; though this is also a chance for the manager to provide guidance and reflections from their own experiences.
  • 4-weekly team offsite lunch. A chance to get out of the office and just be regular people. The fresh air and less familiar surroundings will be good for your psychology.
  • 4-weekly team lunch onsite. On the in between weeks from the offsite lunch. This just makes sure you are getting together and seeing each others faces.
  • 6-weekly team scrum retrospective. It is amazing what you learn about your team when you do this, both good things and bad things. The regular cadence also helps with accountability.
  • Weekly status emails for celebration, retrospection, and future (i.e. next week). At the least, these go to the team. I like to include my boss and other parties who helped play a part in recent success or have some stake (approvers, consulted, informed, etc.) in ongoing work. This is adopted from the PPP Concept.


These tie a team together and provide them the necessary context to self-navigate.
  • Team vision & purpose and company fit
  • Clear short term priorities. This is generally something visible like a Scrum or Kanban board. This is good for the team itself to use, but also for anyone else interested in what your team is doing.
  • Clear long term goals. These should fit with team vision and purpose. Typically, these will be the overarching epics comprising the short term priorities.
  • Rubric for roles on the team. At its worst, absence of clearly defined expectations for each role and level, can lead to feelings of unfairness or favoritism.  At its best, it provides career goals for team members and acts as a guide for what is and (as importantly) is not expected of them. 


  • Constant feedback, both good & bad. This builds respect in both directions. This also helps ensure transparency.
  • Fairness. Managers should be transparent about decisions made regarding other employees. If you feel you are being preferentially treated, speak up. If you notice it, your teammates probably do and doubtless will not appreciate it.
  • Outward communication to other teams about what you've accomplished. This is a step beyond the weekly status email. These happen anytime a major goal is achieved or an epic is completed. There is so much going on in most companies, that achievements get lost in the noise and too often only failures get any attention. If a manager isn't shouting from the rooftops about the team's success, then it's unlikely that anyone else will be.
  • Information pass downs from the manager's manager. Lots of decisions get made on high and awareness of that information about those can help your team be more successful.
  • Commitment to finish your short term priorities. New needs will come up that may be more important and vie for the team's attention & time. It's critical that you remain committed to complete your short term priorities, without this you lose trust and faith in the team. Furthermore, knowing you'll complete your commitments makes you more thoughtful about what you'll commit to.
  • You are held accountable. Whether you're a manager or an IC, if you're not held accountable then your work will suffer.

Outside the Team

  • You should ask a manager for help finding a mentor.
  • You should identify someone in upper leadership who shares your view points and is driven by the same things as you. There will be situations when you need a second opinion or someone to back you.
  • Try to cultivate a sponsor.

Parting Thoughts

Never sacrifice these things. Skipping one or two of them once or twice might be OK, but it will add up and accumulate into discontent and dissatisfaction and poor performance. You must keep yourselves accountable and hold those around you accountable.

Those lucky ones of us will find a manager who will naturally do these things, but for the rest of us we can't simply claim misfortune or "being a victim" as excuse and then give up. When it's your career on the line, it doesn't matter whose fault it is; what matters is the results you are going to get.

Additional Learning

Wednesday, November 11, 2015

What Makes a Good Engineering Culture?

This question was asked several years ago on Quora and has gotten some great answers! I was going through some of my old drafts and came across an answer that I had almost completed writing and decided to finally push through (better late than never!). I've copied the full text of the answer below.


In my experience, a company's engineering culture is possibly the most important thing a person can consider when evaluating a job offer. I've seen many Quora questions touch on similar topics (e.g. "Where should I work?", etc.), and I feel like the root of almost all of the answers stems from the culture. Thank you for asking this question.

My answer to this question is rooted in my ten years of software engineering, numerous blogs & books on the topic, and countless conversations with engineers. It also comes with the expectation that no answer to this question can speak for all software engineers. Everyone has different and changing needs, and with them different and changing views of what makes a good culture.

So, here goes -- the prevailing concepts that have stuck with me are as follows.


High quality software engineering is the product of a team. No one individual can be expected to deliver, nor take credit for, a successful product on his or her own. This gets fuzzy in small startups where there may only be one engineer, but otherwise holds true. A culture that celebrates one individual at the cost of another is making a grievous mistake.

There is an important distinction to note here about what comprises a *team* of engineers versus a group. The distinction is that a group is not a team until everyone in the group is committed the purpose [1]. In my experience, this commitment comes from inspirational leadership and transparency. The fact that an engineer is employed by a company is not reason enough to incite the determination, dedication, and thoughtfulness necessary to produce quality code. Committed engineers are engineers who are proud to work where they work and excited to talk about their jobs and their company's products.

Stake in the Product

A healthy culture builds a product that means something to them. This can happen in a lot of ways, one of which is by connecting the engineers with the users. Some examples are providing engineering the opportunity to sit with the customer support team, to join sales or product folks on customer visits, or to attend company conventions. As an engineer, it is such a rewarding feeling to meet a customer and hear their story about how a feature you built makes their life better or how the bug you fixed made their day. This kind of connection is what makes the software you build at your job significantly more meaningful than any college project or homework assignment.

The next critical piece of this is a healthy relationship between the product and engineering teams. I've always been baffled by product management teams that treat engineers like they are incapable of contributing to product design. Product and engineering need to function together like two sides of a brain; and respect needs to flow in both directions. Giving engineers a say in the product design simultaneously gives them stake in outcome. People are much more likely to care about the success of something they helped design versus something they were told how to do.

Equally, a culture that does not respect the purpose of its product team is no better off than one that does respecting engineering. The truly successful cultures of which I have been a part were made up of cross functional teams of product & engineering talent working together to set short term and long term goals; write achievable design specifications; and build software that makes both groups proud.


Next up: Experiments! Happy cultures promote experimenting with features over endlessly debating them [2]. One company where I've seen this really done well is Yammer, where each feature has a clear definition of success, but is assumed to be unnecessary until proven otherwise by objective usage metrics and customer feedback [3]. This precludes endless debates or analysis paralysis, and effectively lets product and engineering teams focus on what matters: building software. A positive side effect of writing software that inherently supports experimenting is that it encourages a culture that commits to small, manageable deliverables.

Encourage Learning

Another hallmark is a culture that encourages establishing a *deep understanding* of the tools, work flows, and responsibilities that go into producing production ready software. A team that knows __why__ something must be done in a certain fashion will encounter substantially less mistakes than a team kept ignorant by relying too heavily on process scripts. Intuitively it seems natural to reduce mistakes by automating processes and restricting influence. In my experience, it is a fine balance of the safety in automated process and the danger in freedom that produces a well rounded team more apt to optimize the whole rather than their own piece of it. Automated process saves time and avoids unintended errors, but that intentionally dangerous freedom lets the engineers know that they are respected and responsible for the success of their own products. That responsibility provides the motivation to go the extra mile to truly understand, for example, how a service is going to function in a production environment, or how version control really works.

Deployment Is Just the Beginning

Engineering is more than development, it's also deployment and support. Time spent in development is just a fraction of a project's lifetime. The majority of a product's life is in production. Engineering teams that fail to structure priorities around this concept will either produce poor quality products or endlessly miss deadlines because they are fixing bugs.

Learn From Failure

Nothing is perfect. Even rocket scientists at NASA make mistakes [4]. Expect failure and plan for it, in your code and in your culture. Learning from failures and *improving* from them makes a team stronger. I won't say too much about this because much has already been said elsewhere. Etsy has published some great material on this subject [5] and [6]. The Pragmatic Programmers also published a great book on ways to mitigate failures in production [7]. For my part, I'm proud of how we learned to learn from failure at Box [8].


Ultimately, all of this represents my own opinion based on my time spent engineering software. One prescription does not fit all and any prescription should be an ever changing vision.

[1] Leading Lean Software Development: Results Are not the Point: Mary Poppendieck, Tom Poppendieck: 9780321620705: Books
[2] Always ship trunk: Managing change in complex websites
[3] Why Yammer believes the traditional engineering organizational structure is dead
[4] The Martian
[5] Failure is an option - Velocity 2015
[6] Kitchen Soap  –  Learning from Failure at Etsy
[7] The Pragmatic Bookshelf | Release It!
[8] The Three Letters that Saved My Tech Debt

Sunday, November 8, 2015

Managing Runtime Configurations

Configuration Headaches

Managing runtime application configurations in large scale, heterogeneous environments is a total pain. Over the last five years I have attacked this problem in various ways, each with its own grace and flaws. The goal of this post is to sum up the evolution of my experience and hopefully impart some insight to any folks in similar plight. 
It starts with the challenges:
  1. Different environments get different configurations
  2. Different services in the same environment get different configurations
  3. Configuration files get massive and treacherous
  4. Configuration files get complex quickly with cross references
  5. Configurations have no safety against typos in keys (or values)
  6. Configurations have essentially no type safety guarantees
  7. Configurations have no README indicating their intent or usage
  8. Configurations can be accessed from anywhere in a code base (or outside) in inconsistent manners

Sidebar: Defining Runtime Application Configurations

For the context of this article, I use the term “runtime application configurations” to represent the configurations used by an application at runtime. I do not mean configuration as code type things like Puppet, Chef, or Ansible. For example, I would consider the log configurations in alogger.xml file for a JVM application that I own to be runtime app configs. I would not consider the configuration of Apache on a web server a runtime app config in this context, I would consider that a system configuration. More specifically, I would consider it configuration for an application whose source code I'm not writing.

Evolution of a solution

Compound keys

The most popular convention I've observed for managing configurations is by flat file such as ini, json, yaml, xml; very rarely is it in the same language as the code consuming it. My assumption is that this accomplishes a few things:
  1. They can be edited by non-programmers
  2. They can be consumed by multiple programming languages, thus freeing operations teams from managing multiple manifestations of the same configuration values
  3. In the case of compiled code, configurations do not need to be compiled (or recompiled when they change)
This restriction is very powerful in the guarantees of simplicity, but also limiting. One major limiting factor is that without a separate management system, it’s impossible to have different values of configurations per runtime environment. This was a major problem for us since we shared configuration files between all of our environments, but need to set different values for the same configuration key based on the environment.
Our initial solution was to encode the environmental context onto the keys such that we could effectively define unique values for a given key based on the environment that would use it. So instead of a single key-value pair, we would have multiple key-value pairs differentiated by environment. See an example of log levels for different environments.
level<dev> = DEBUG
level<staging> = INFO
level<prod> = WARN
When the INI gets parsed natively, each of the level's will be parsed as unique keys. The next step is custom code in the application to split the environments (e.g. <dev>) off of the keys and then figure out which environment and value applies in its running context.
Sound complicated? It is. Making this work involved writing some complicated INI parsing code as well as figuring out a way to tell an application the environment in which it’s running. The result was a brittle system that was error prone and slowed down onboarding. Furthermore, if any configurations needed to be shared between projects, those projects also needed to solve the parsing and environment setting problems.

Moving away from flat files

I asked myself, “Why are configurations always in flat files anyway?” I couldn’t come up with a convincing answer, so my next attempt was to write a straightforward configuration framework in PHP that expected different values per each key based on environmental context. All configuration files were PHP files that returned a large array. I experimented with model objects that could build configurations, but ultimately discarded the idea because I felt the time to solve the edge cases would outweigh the incremental value.
With this system, the mind bending key-to-environment relationship was slightly more clear. Another gain was removing cross references to other keys within the INI values. Since the values were set with PHP, it was possible to reuse values or base other values off each other, e.g. url = "{$scheme}{$domain}{$path}".
Although the system was less brittle and easier to understand, the configuration files themselves were still rather large and it wasn’t clear how defaults worked between environments. Worse, the files could only be used by PHP applications.

A separate configuration management system

My biggest problem was the size and complications of putting configurations for every environment in the configuration files. So I decided to look into using a separate system to manage the differences configurations, which could then deliver only the necessary content to a given consumer. I landed on Puppet since we already use it extensively elsewhere. This allowed me to write files containing only the configurations that mattered in the respective environments. Based on this, going back to flat files was possible. I could also tie together configuration values in Puppet before writing the files, so there was no need for cross references within the configuration file itself.
This was great, my first 5 problems were solved. But still, type safety was not enforced, configuration files had no guaranteed documentation, and understanding how they were used within an application was a grep nightmare.

Configuration model objects

I decided to revisit configuration model objects. However this time not as builders, but instead as accessors. By funneling all access of configurations through a single point, I figured that I could enforce a few things.
I buried the configuration parsing code inside an abstract class (aptly named Configuration) with protected methods for getting at the parsed values. Configuration model classes could then extend the base and expose relevant methods for their configurations. For example, a logging configuration class would have a getLevel() method, which behind the scenes would parse the configurations file and return back the value. Any user that wanted to load configurations could only do so by using a Configurationclass, or writing a new one. Next, I tied the name of the configuration file to the name of the class, such that if a class were named LogConfigurations, the configuration file behind the scenes needed to be named log.conf. Finding usages of a given configuration file became trivial.
I added type expectations methods to the parsing code so that invalid values would raise exceptions. TheConfiguration class requires that all extending classes implement a README() method, so that users understand the classes intention and expected INI contents. This is coupled with a generic unit test that parses the README() output and ensures it can actually be used by the class. Furthermore, each of the accessor methods has its own documentation.
Configuration model classes also opened the door for more sophisticated configurations. Since access is inside a class, and not just array referencing, the class can do smart things like tie multiple values together or call out to other classes. One really great manifestation of this was the ability to create the ProjectConfiguration class which uses the project version control to load a configuration file’s contents.

Get involved

That’s where I am today. It’s been a fun journey and I’m happy (atm) with where things are. I’m excited to see where this goes next and to learn other ways folks have found to solve this problem.
One key component that I’d like to see next is the configuration delivery mechanism. Currently, Puppet solves the problem reasonably for short lived PHP processes. However, the two areas that I’d like to improve are:
  1. Automatic reloading by long lived processes. Maybe this is just from a file watch.
  2. A better interface for managing the configurations. This could get extensive. One tool here would be validation. This could be much more than just type checking, but semantics and plugins for special configuration groups (e.g. validate a database in a slave section is up not a master). Typos in keys would be completely avoided. Another would be the ability to see what values a given environment would get.
Please post back with comments or pull requests!