Fixing Weird Formatting, The Programmers Way
→ January 8th, 2010Here’s a unique problem. A friend of mine suggested I get some Robert E. Howard stories for my Kindle and pointed me to the Project Gutenberg Australia website for some free eBooks by him. eBook is a loose term, since the versions that are available are either TXT or HTML versions.
I chose the TXT version, since I could just dump it on my Kindle and have a decent looking version. The problem: the TXT version is really an HTML page with a large chunk of preformatted text as the eBook. They also limited the sentence length to 80 columns. Even if I stripped the HTML from the source, the line breaks didn’t match with the width of the Kindle.
So I wrote a Ruby script to fix it. It reads every line of the file. If there’s a sentence, it prints it back out without a newline character. If there’s a newline character, it starts a new paragraph. The script also allows you to skip a certain number of lines in the beginning, since they usually represent the title and author information and should be on separate lines.
It was also a good excuse to learn Ruby’s OptionParser library, although I didn’t dive in too far.
Enjoy.
#!/usr/bin/env ruby
require 'optparse'
# Default options
dumped_line = false
input_file = STDIN
output_file = STDOUT
skip = 0
# Parse the options
ARGV.options do |o|
script_name = File.basename($0)
o.set_summary_indent(' ')
o.banner = "Usage: #{script_name} [OPTIONS] [input_file] [output_file]"
o.define_head 'Convert given Gutenberg txt file to a cleaner text file'
o.separator ''
o.on('-s', '--skip=val', Integer, 'Lines to skip') { |s| skip = s }
o.separator ''
o.on_tail('-h', '--help', 'Show this help message.') { puts o; exit }
o.parse!
end
if ARGV.count == 1
# One more argument means an input file is given
input_file = File.open ARGV[0]
elsif ARGV.count == 2
# Two more arguments means both input and output file are given
input_file = File.open ARGV[0]
output_file = File.open ARGV[1], 'w'
end
input_file.each_line do |line|
line.strip!
# Do we need to just pass the lines?
if skip != 0
output_file << line
output_file << "\n"
skip -= 1
next
end
if line == ''
# Blank lines mean new lines
output_file << "\n"
output_file << "\n" if dumped_line
dumped_line = false
else
# Just dump the line without a new line
output_file << line
output_file << ' '
dumped_line = true
end
end
input_file.close
output_file.close
Weeks in a Month Calculations
→ January 5th, 2010I was bitten by a nuance in Ruby where the Date “2010/01/01″ is actually the 53rd week in 2009. I probably don’t fully understand how the cweek method works, but to see it in action, fire up irb and try:
Date.civil(2010,1,1).cweek
I needed a new way to calculate all of the weeks in month and my old solution was a hack, so I came up with another quick hack to get it right. Below is my code to extend Date and Time to return an array of ranges for every week in a month.
module ActiveSupport #:nodoc:
module CoreExtensions #:nodoc:
module Date #:nodoc:
module MyCalculations
# Return an array of ranges with the weeks in the month
def weeks_in_month
weeks = []
start = finish = beginning_of_month
while finish != end_of_month
finish = start.end_of_week
finish = end_of_month if finish > end_of_month
weeks << (start..finish)
start = finish + 1.day
start = start.beginning_of_day
end
weeks
end
end
end
module Time #:nodoc:
module MyCalculations
# Return an array of ranges with the weeks in the month
def weeks_in_month
weeks = []
start = finish = beginning_of_month
while finish != end_of_month
finish = start.end_of_week
finish = end_of_month if finish > end_of_month
weeks << (start..finish)
start = finish + 1.day
start = start.beginning_of_day
end
weeks
end
end
end
end
end
class Date
include ActiveSupport::CoreExtensions::Date::MyCalculations
end
class Time
include ActiveSupport::CoreExtensions::Time::MyCalculations
end