Tuesday, December 29, 2009

Bit (not Byte) Manipulation in Ruby

I was recently tasked with creating a rough version of the Lempel-Ziv 77 encoder/decoder engine for use in most operating systems (i.e. Windows, Linux, Mac). The application would need to read a binary file and compress it or decompress it to another binary file. The compression algorithm involved a format bit specifying compressed or literal bytes to follow and then distance and length bits of instructions for compressed data. Such an application would clearly involve a good deal of bit manipulation and consequently require a solid bit manipulation library.

The logical language of choice to me was C++ because of its proximity to the memory, inherent ease of bit manipulation, and presence on every computer since I was born. Unfortunately, I can probably barely compile a "hello, world!" application in C++ =( Next I considered Java since it's open source and present on most people's computers. However, my Java skillz have sadly dwindled since college to the point that I frustratingly discarded that project about an hour after I started. Finally, I decided upon Ruby as my language of choice -- mainly because I like coding in Ruby.

My project got off to a good start until I realized that the original research I'd done on manipulating bits in Ruby had been incomplete. Ruby inherently manages characters and bytes synonymously, but bits are another story. Based on the loose typing model of Ruby, any use of bits throughout my code was being converted to their numeric string representation behind the scenes. For example, 0xff was ending up as the string "255" when I was writing it to a file.

Finally, after much worrying, reading of documentation, online research, and irb investigation, I had an answer.
  • Bytes can be specified in Ruby per bit as such, 255 = 0b1111_1111 (each four bits are separated by an underscore). This was important for me since I was doing a lot of shifting and didn't want to worry about the actual numerical values in my unit testing.
  • Bytes can be written explicitly to files in Ruby using the << operator along with Array.pack.
File.open("foo.txt", "wb+") { |f| f << [0xff].pack("c") }
  • Bytes can be easily read using File.each_byte
  • The byte code for a given character can be accessed using: "a"[0]
  • Binary file manipulation involving windows must be done using the "b" flag when opening the file. Otherwise, the windows file system will treat certain bytes as termination characters and ignore the remainder of the file. I learned this the hard way because each_byte would just inexplicably stop reading in bytes from my file before the file was finished.
After I had all of this figured out, Ruby proved to a very nice environment for writing the app.

New Class Templates in C#

My team finally upgraded to .NET 3.5 along with Visual Studio 2008 recently, which has been a huge source of happiness for me. However, since the upgrade, I've been getting annoyed with VS's insistence that I always include the Linq library in each of my new classes. I typically remove all of the default using directives anyway, but since I don't automatically reference the System assembly that contains the Linq definition, I was getting a pre-compile error from R# every time I added a new item to a project. In addition, I've been growing tired of having to type "public" each time I add a new class (since the default is nothing). So, I decided to take action.

A little bit of web searching lead me to the Visual Studio Template Reference. Each time a new item is created in Visual Studio, VS finds the template definition that matches the item and generates the code to match. Templates support logical control flow and variable replacement. By default the C# class templates are stored in "\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\ItemTemplates\CSharp\Code\1033", with similar templates stored nearby. The instructions are geared the reader toward creating their own templates for new types of items, which is a very useful tool (but I can do that with resharper), but I really just wanted to change the default. So I went in and did it: no more "using" and always public.

Before my changes would take effect, I first had to close visual studio and then run the following command,

devenv /installvstemplates

which will rebuild the VS templates run-time cache folder.

Now I can focus on all the fun features of 3.5!

Wednesday, December 9, 2009

VI - Macros

Abstracting repetitive steps of work in an on-the-fly macro is a task I find myself doing almost daily during development or database scripting. After hammering a pretty good one today, I was wondering to myself how to go about saving it. I did some Google searching ("save vi macro", "copy vi macro"), but came up empty. Then I realized that I already knew the answer. Whenever I record a new vi macro, I record to the q register. So I knew that my macro must be sitting as plain old text in the q register. So, I pasted the q register to my screen and lo and behold, there was my macro! I created a macros text file to track all my commonly used macros with descriptions of what they do. Yay!

Wednesday, October 7, 2009

VI and Ruby

I've decided it's way past time to get up to speed with Ruby. After several years of constantly changing my personal idiom for javascript styling due to learning things in pieces, I decided that I would try to learn the Ruby style up front. The two biggies that I've been offending are,
  1. One blank space between comment hash (#) and first character
  2. Spaces (and two of those, btw) not tabs
Since I'm just starting this blog thing, I haven't had the chance yet to mention that I'm a big VI fan. Naturally, I'm doing my Ruby development in VI. So, first I needed to make some use of VI's find and replace expression with backreferencing power for #1,

# Insert a blank space into all comments not beginning with one blank space but be careful to avoid replacing other uses of # (e.g. #{})
:%s/#\(\w\)/# \1/

Nice! That was easy. Yet another experience of feeling pretty good about VI.

Now on to style tip #2. Other than the simple search/replace to clean up my existing code, I had to change my tabs to spaces. Some simple changes to my .vimrc and I was ready to go!

#tab = two spaces
set tabstop=2
#use spaces and not the tab character
set expandtab



Friday, August 21, 2009

log4net: No layout set for the appender named

I recently created a new ASP.NET web project and set it up to use log4net with a RollingFileAppender as its logging framework. However, at first my log file wasn't showing up at all and then after I got it showing up, my log statements were not being written. Fixing this involved two steps.

First, the DLL reference generated by Visual Studio did not automatically set the log4net assembly to copy local. Setting this property on the reference solved the issue of my log file not appearing.

Second, now that I had my file, I was not seeing any log statements written to it. Debugging the process, I saw the following error output from log4net:

AppenderSkeleton: No layout set for the appender named

Looking at my configuration file, I saw that I was setting an appender layout. However, I noticed that my appender layout was based on a type defined in a different project. Checking the reference to that project, I again found that Visual Studio was not copying a local version of the assembly, i.e. the Private flag was set to false. Switching this flag to true fixed the problem and now my logs are showing up!

Wednesday, August 19, 2009

Long Method Names

As code completion (or auto complete) has become more ubiquitous and second nature to most software developers, so also has the tendency to spend less time thinking critically about the names that we assign our variables and methods. In place of this, I have observed the likelihood to squeeze the entire purpose of the variable or method into the name (e.g. DoThisUnlessThatIsTrueAndBeGentle()). Perhaps we may soon have the entire algorithm of a method squeezed into a name (with the obvious advantage of not needing to read the method at all!). Have we lost (or forgotten) the advantage of abstracting the purpose of a variable or implementation of a method (or class) behind a simple, distinct name?

Disadvantages:
1. It's almost never possible to completely and accurately name a variable or method. Any attempt to do so may lead to more confusion.
2. Side-by-side comparisons of two versions of the same code become more painstaking with horizontal scrolling -- perhaps a bug is even missed because the developer is too lazy/busy to mess with the scrolling.
3. Constrained to multi-line formatting

Advantages:
1. More descriptive names?

I am often reminded of an article by Joel Spolsky (back when I read his stuff): http://www.joelonsoftware.com/articles/Wrong.html (scroll to the "I'm Hungary" section towards the bottom) when I ruminate on this subject.