BMS – Tech, Geek Humor, Etc

meh.

Donating My VPC CPU Time

When I was the VP of the Temple ACM, Tim Henry and I headed up an effort to get other ACMers to participate in the Folding@Home project.  In short, this project is a way to donate spare CPU cycles to scientists working on computationally difficult problems.

A few weeks ago, I found myself bothered by how little I’ve put my Linode to use.  For a while, I was using it quite a bit, and then life came along and gave me an upper-cut.  So, I’ve decided to go back to donating CPU time to scientific projects.  I’m using BOINC since it allows me to donate to several projects at once and has a pretty awesome Linux client.

If you have spare CPU time floating around, I’d encourage you to do the same!

Share

Making the (Career) Move

I’ve just recently updated my LinkedIn to reflect it, but I left Vanguard back in September to join Coldlight Solutions, a start-up in Wayne, PA.

I’ve not written it here, but those who know me off-line know that I would like to pursue a PhD at some point in my life.  I felt that moving to Coldlight would put me closer to that goal.  Nothing against Vanguard, but I felt that I wasn’t growing that much there.  Writing web services are good and fun, but I was using versions of libraries that were written for Java 1.4.  I wasn’t using anything terribly interesting outside of Spring.  Unfortunately, Spring isn’t the answer to everything.  There are bigger and much more interesting things than Spring in the Java world.

Coldlight does a lot of work with large data sets and machine learning, which happens to be a fairly hot research topic.  I figured the move would give me a chance to explore a lot of new technologies and to gain some experience and insight into a possible PhD research topic.  The move would also have me working with a PhD who could possibly help guide me into my next transition.  After a few rounds of interviewing, I made the switch:  I decided to move to Coldlight.

So far, I’ve been right.  I’ve learned several new technologies, such as Amazon Web Services and Hadoop.  I’ve started learning about different machine learning algorithms by poring through different implementations and papers.  Sure, I occasionally get bogged down in busy-work and get overloaded.  Overall, though, I feel like I’ve been growing.

As an added bonus, I’ve been able to usher in some practices and technologies of my own.  I’ve introduced my team to proper SVN usage, Maven, some pretty cool Eclipse plugins, and a few other things.  I’m directly responsible for some of our process improvement initiatives and for automating some of our development headaches.

Feels good, man.

Share

Gone and Back

Holy crap, where have I been?

All over the place, that’s where. I’ve had a lot happen since I’ve last posted:

  • I now hold a skydiving license
  • I have a new job at a place called Coldlight
  • I have a new girlfriend
  • The 0 A.D. website that I’ve contributed to off-and-on for the last couple of years has been released
  • I’ve learned how to write map/reduce jobs for Hadoop
  • I’ve become a Jenkins/Nexus/Maven guru at work
  •  I’ve been playing D&D regularly
  • and a bunch of other stuff!

I have a lot of writing to catch up on.  I’ll be taking some time over the next few weeks to make sure that my LinkedIn, Google+, and all of that other stuff is up-to-date.  I also plan on writing posts on some of the cool stuff I’ve learned and some of the cool devices I now own (like my Samsung Galaxy Tab 2).

Write on!

Share

0 A.D. Wins Project of the Month on Sourceforge

I almost forgot to brag: 0 A.D. was Sourceforge’s Project of the Month in June. Check out the post here: https://sourceforge.net/blog/zero-ad-potm-june-2012/.

It feels pretty good to be on a team that’s being recognized by the rest of the open source community.  Also, this podcast was the first time that I’ve ever heard Erik or Aviv speak.  Now I have a voice to read their posts in!

Share

Loading the English Wikipedia Dump is a Huge Pain

…but I did it!

I decided a while back that it’d be cool to do some graph analysis on the English Wikipedia database.  I want to study how “connected” the articles are.  I also want to collect some statistics about the text of the articles themselves, their references, and some other fun numbers.  At first glance, Wikipedia makes this easy.  WikiMedia (the group behind Wikipedia) publishes their database dumps so that others can download entire copies of any Wikipedia site in any language.  All of the dumps provided are SQL dumps, with one big exception:  the page text.  In order to maintain backward and forward compatibility, MediaWiki began producing XML dumps a while back.  This XML file can be used to generate the whole database, but to do so would mean that a Wiki engine would have to scan and evaluate the text of every single article in order to build some of the relationship tables.  Ew.  Luckily, MediaWiki provides a tool for parsing this data, so easy-peasy, right?

No.

First Challenge:  Huge SQL Dumps

Ok, let’s handle the easy part first.  I downloaded a full dump and spun up a virtual machine in Xen to handle the processing.  Since MySQL is good enough to run Wikipedia, I figured that it was good enough for me, too.  For the sake of completeness, I did not start with a blank database.  I downloaded a copy of MediaWiki 1.9 and used the included SQL script (maintenance/tables.sql) to create a blank MediaWiki database.  Being lazy, I started running commands like this:

zcat some_huge_file.sql.gz | mysql -u me -p en_wikipedia

Don’t get me wrong: that would work…if you wanted to wait several years.  Why was it so slow?  Well, the dumps provided aren’t wrapped in a transaction or anything like that, so the database reindexes after each individual insert.  Slow is an understatement.  To fix this, I wrote and zipped two small SQL scripts.

preimport.sql

SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
BEGIN;

postimport.sql

COMMIT;
SET autocommit=1;
SET unique_checks=1;
SET foreign_key_checks=1;

So, what’s going on here?  On preimport.sql, the first line is pretty self explanatory:  I don’t want my queries to auto-commit.  Rather, I want to chose when and if they are committed.  Secondly, I wanted to disable unique_checks during the import.  I was using MyISAM tables, so this doesn’t do much for me, but those of you out there who may want to use InnoDB tables will benefit greatly from this.  Likewise, I next disable foreign_key_checks.  Lastly, I use the BEGIN key word to start a transaction.  This will prevent MySQL from calculating indexes until the entire data set has been read.  As you can see, postimport is just the exact opposite of preimport.  I also mentioned that I zipped them.  Why?  So I could do this:

zcat preimport.sql.gz some_huge_file.sql.gz postimport.sql.gz | mysql -u me -p en_wikipedia

That statement will, all in the same MySQL session, read the contents of preimport.sql, the huge SQL file, and then postimport.sql.  If you’re cool, you can even wrap that whole mess with the time command so that you can get an exact time of how long it takes to run each import.  All in all, most of my imports ran in only a few hours.  On on!

Second Challenge:  That Huge XML File

By this point, we have just about all of the metadata we could want.  What about the real stuff?  You know — the content?  Oh yes, that!  Well, that’s where things get a bit sticky.  There 3 common options that I’ve found for importing the data from the enwiki-sometimestamp-pages-articles.xml.bz2:

  1. importDump.php
  2. xml2sql
  3. mwdumper

importDump.php is a tool built into MediaWiki.  It can produce an entire database from one of these XML dumps.  Unfortunately, it’s really slow.  It’s not recommended for use on larger data sets.

xml2sql is an ANSI C program can can extract page and text information, but not any of the metadata.  It’s currently not maintained and it’s not officially supported by MediaWiki.

mwdumper is the official MediaWiki tool.  Unfortunately, it’s not well supported, either, but it turned out to be my best bet.

In order to read the XML dump, I went with mwdumper since it’s the official tool and it’s written in a language that I’m quite dangerous with.  The first thing to note is that the most up-to-date version is not available as a binary.  I had to download the source and build it myself.  If you’re familiar with the SVN-Maven-Java stack (or you use STS) then this is pretty simple and straight-forward.  I’m not going to cover how to build the software here.  I assume that either the reader is able to do this already or that there are instructions for doing so on the project’s page.

Once I was able to produce an executable JAR, I bumped into three problems.  The first one involves the file’s format. The file that MediaWiki produces is a valid bzip file, but for whatever reason, mwdumper does not recognize it as such.  You can either unzip it first, or use a pipe.  I wasn’t very creative, so I first unzipped it, and then ran this:

java -server -jar ../mwdumper-1.16-jar-with-dependencies.jar --format=sql:1.5 temp2.xml | gzip -vc > enwiki-latest-pages-articles.sql.gz

Where temp2.xml was the unzipped version of the XML dump, and enwiki-latest-pages-articles.sql.gz is where I want the SQL script to go. This command will process the XML and convert it into SQL INSERT statements and then pipe it through gzip so that your output stays small(er).  When mwdumper is finished, we’ll have a zipped SQL file that we can handle just like we did the others in Step 1.

The second one is that some of the queries are larger than my MySQL server would allow for.  To handle this, I had to modify my /etc/mysql/my.cnf (on Ubuntu) file and change this setting:

max_allowed_packet = 128M

This setting controls the maximum size of a query that the server can receive. By default, this value is rather small. Since we’ll be importing rows that contain whole articles, this value must be raised. The value above should work just fine.

The third and last problem is that mwdumper would come to a grinding halt a little over 4 million records in.  It would throw a strange error about UTF-8 encoding.  I didn’t see anything wrong, and I didn’t have any reason to believe that the file wasn’t being encoded correctly, so I inspected the first few lines of the file and found that there was no XML declaration.  It appears that mwdumper assumed that it was UTF-8, so I decided to add the following header:

<?xml version="1.0" encoding="ISO-8859-1"?>

This is a common encoding for western European languages. There might have been characters that got mangled in the process, but I wasn’t terribly worried about the occasional character here-and-there. If you are, then you’ll need to find your own workaround for this issue.

Once these 3 issues are cleared, then mwdumper will be able to produce usable SQL that you can import. If you use the command that I provided in step 1, you’ll end up with a gzipped sql file that can be used just like any of the other compressed SQL files in step 1.

Conclusion

Not having this information up-front was a huge pain in the butt. I’m also still not sure about the UTF-8 encoding issue. In retrospect, I really don’t know if the file was correctly encoded or not. At some point, I’d like to automate this process so that imports aren’t such a hassle.

Share

Book Review: Effective Java, Second Edition

A few hours ago I finished reading Effective Java, Second Edition by Joshua Bloch.  Bottom line up front:  totally worth the read!

Format

The book’s format is interesting.  At first, I wasn’t sure if I liked it, but it grew on me.  The book is divided into 78 items, each one detailing arguments for an idiom or principle.  Many times, these items reference each other, especially when in the same group.  Sometimes this lends itself to the temptation to skip ahead to one of the mentioned items, but I managed to resist.  It’s interesting to see how some of the items at the very end of the book related back to ones at the very beginning.  Forcing myself to wait to read about some of the related items gave me time to reflect on the item at hand and to kind of put it in the back of my mind before moving forward.  As I came across references to earlier items, I found that the earlier items made much more sense.  Not that they didn’t make sense before, but sometimes Bloch would write a line or two that referenced items that I hadn’t covered yet.  All of the sudden, those extra few lines here and there started making a lot more sense.  I think he did a very good job at linking items while allowing them to stand separate.  I didn’t have to skip ahead to understand the item at hand.

The code used in the book is often reused when possible from item to item.  This makes it even easier to link common items together.  Because I was already familiar with the code, I could understand why parts of it were structured a certain way, which allowed me to focus more on the idiom at hand.  Often in programming books, example are written to illustrate a point in such an oversimplified fashion that they end up violating principles earlier in the book to demonstrate the current topic or they end up not offering enough code for a full demonstration.  Bloch does a really good job of avoiding this.  If he demonstrates a principle that overlaps with another one, he’ll write the code such that it adheres to both and note it accordingly.  I also like that each principle comes with some “good code, bad code” examples.  While reading through some of these, I couldn’t help but to be reminded of “Good Idea, Bad Idea” from the Animaniacs.

Structure of Arguments

I can’t remember the last time I read a book when an author provided so much balance to their arguments.  Granted, sometimes Bloch did strongly advocate for or against something.  In most of those cases (I say most because I don’t have enough background knowledge to make judgements on some of them), they were warranted.  Often, each argument came with a list of times when they were appropriate and when they were to be avoided.  Sometimes, simple litmus-test type questions were given in order to aid a developer in figuring out when a principle applies.  To me, this was *the* most important content of the book.  Some of the principles have obvious applications, but others were more obscure.  The list of principles does me know good if I cannot make an intelligent argument for or against one when developing a system.  I now have simple questions that I can ask myself or others to help determine what patterns or idioms are appropriate to apply to a problem.

Citations

You can’t read an item without seeing several citations.  Looking at the list of sources in the book’s appendix gives the reader a sense of just how much work and knowledge went into creating this book.  His sources run the gamut, from items as low level as the Java specification itself all the way up to citing himself and other best-practice authors.  This gives the reader the tools to go deeper should they choose to.  Want to know why certain features of the language aren’t guaranteed?  Go read spec-X!  Want to be a multi-threading wizard?  Go read guide-Y!  As a result of this book, my next purchase will be the book that he cites on multithreading topics (I don’t remember the title off-hand).  I’m also now curious to try my hand at going through some of the language specs.  Way to spark interest!

Final Thoughts

While this book is no bible, it was effective immediately.  I was able to easily apply some of these principles to my work and open source programming efforts.  I feel like I have a much deeper understanding of the language and how certain programming errors start and propagate themselves.  I’ve thought of type safety issues before, but the book took this sentiment a step further by getting me into the habit of structuring my code so that more errors are caught at compile-time.  This has been a great lesson in syntax, patterns, and quality assurance.  I will be reading this book again — it was full of gems, and I’m eager to go back a second time and to see what else dawns on me.

Share

Boxee Box Review

For the last few years, I had been using a modified XBox running XBMC.  This was great for streaming stuff (movies, music, etc) from my Ubuntu SMB server to my TV.  Now that I own a TV capable of high-def, I wanted more.  I also wanted to lower my cable bill.  Unfortunately, XBMC does not directly support premium video services like Netflix.  Further more, the 3rd party Netflix plugins won’t work for XBMC if you’re not running on a Windows or OSX machine due to the lack of MS Silverlight.  I decided it was time to switch — I want a device that can stream from premium sources AND my local network.  Enter the Boxee.

 Why Boxee?

The biggest thing was definitely the network media streaming.  Very few of the media center appliances have this.  I have a file server that’s full of my (legal, of course) music, movies, and TV shows that I’ve acquired over the years.  I want to be able to watch those on my TV, too. Plus, I shouldn’t have to fire up my computer just to play a few MP3s through my stereo.  If I’m going to buy something to play media to my TV, it better handle local network media, too.

The second biggest thing was Netflix.  I wanted to replace my cable with something that I wouldn’t get bored of.  I used to share a Netflix account with a friend.  I remember being blown away with the selection and quality of the media they offered.  The Netflix part of the decision was a no brainer:  it was mandatory.

Third, I own a high-def TV.  I wanted something that could handle high-def resolutions.  Even if Netflix isn’t full hd, it might be some day.  Also, some of my local network content is high-def.  Not that I’m picky about connectors, but having a box that uses HDMI makes life easy. One plug to rule them all!

What’s to Love?

I’m glad you asked!  First and foremost, the Netflix app is pretty awesome.  It’s not given me any stability issues, it’s easy to use with the remote, and the picture looks great.  Setup was a breeze.  The first time you power it on, you’re prompted to sign into a wifi network, configure the output resolution, and a few other things.  I also like that I get recommended content in my home screen.  As of now, it’s just free content from YouTube and a few other sources.  I’d like to see if I can modify that, but for now it’s fun.  I end up finding some really cool short movies whenever I go flipping through the home feed.

The remote is pretty awesome.  On one side, it’s a directional pad with a select and back button.  On the other side, it’s a QWERTY keyboard.  The keyboard is really cool for searching for content.  My friend Tim used to launch Netflix through his Wii, and typing anything of length by using the Wiimote was a pain.  The keyboard is a cool feature.  In addition to the remote, there’s also a built-in web server.  The web server does’t put out human-readable web pages, but it does allow for 3rd party apps to control the Boxee.  For my Android phone, I found an app called Boxee Thumb.  Basically, it auto-detects the Boxee (if you’re connected to the same network) and allows you to use swipe gestures, soft buttons, or your phone’s keyboard to control the unit.  The swipe gestures are pretty cool — I enjoy being able to swipe through play lists.

The Boxee also allows for 3rd party software.  The default software channel has 200+ apps.  You have the ability to add additional repositories, which provide additional software.  I have yet to try any third party channels, but I’m looking.  You can also write your own apps and either load them onto a channel or onto a USB stick and run them locally.  The interface is done by using HTML5/CSS3.  Python is used to access the Boxee’s functionality and can be embedded right into the HTML to provide access to system functions.  Pretty cool!

What’s to Hate?

A lot, unfortunately.  The Boxee has a ton of great features, but they’re offset by the interface.  It simply sucks.  Most of my negative opinion centers around the interface’s instability.  If I stay in an app, everything’s golden.  As soon as I start switching apps and switching windows, the Boxee will freeze and need a hard reset.  What a pain!  If my phone crashed  that often, I’d throw it at someone.  Fortunately, I don’t venture from app-to-app too often, so it’s tolerable, but leaves a bad taste in my mouth.

Some parts of the interface are also rather ugly.  Local media browsing is one of those places.  When it works right, it looks nice and is able to pull episode, movie, and track information from the web.  When it doesn’t work, it’s a pain.  Manual resolution sucks.  Some of it is my fault for not having things properly labeled, but I should be given a better interface (perhaps web?) for fixing unidentified media.  Also, scanning of network (in this case, SMB) shares causes a noticeable lag.  I’d rather have a slower scan that doesn’t bring down the system than to have a faster scan that doesn’t share the CPU well.

I hate to say it, but the box itself is almost as ugly as some of the UI components.  I don’t think it’s attractive at all.  Instead of rounding corners, I simply have one sliced off.  The appearance is awkward at best.

Overall

This thing’s saving grace is all of the codecs and external media features.  Overall, I’d give it a 3.75/5 stars.  The interface gets a 2/5 for sucking so bad, and the feature set gets a 5/5 for being so rich and complete.  I would recommend this unit for those who are more tech savvy, have more advanced needs, or are simply patient and are willing to wait for these things to be fixed via updates.

Share

Working Hard on the 0 A.D. Website

Shortly before Christmas, I got this message from Erik, the producer of 0 A.D.:

I hope things are well with you. Have you still gotten a moment or two for 0 A.D. or have you become too busy? We’ve been trying to contact you via PM/email, but haven’t got any response, so it’s hard to know the case of things.

If you do have a moment or two we have things we could use your help with. Both some forum upgrades etc, but more importantly/hopefully more interesting as well we are finally getting closer to a new web site. We do have a designer working on the design, and a web development applicant who might be able to help out with the technical stuff. But naturally it would feel a lot better if you would be in charge of the technical side as we know and trust you.

Either way I hope you will have a good Christmas and regardless of whether or not your current situation allows you to help out I’ll be grateful for a reply so we know how things are :)

D’awwwwwww.

Since then, I’ve gotten pretty serious about getting a new site delivered.  We’ve been making a more conscious effort to stay organized, work hard, and get something tangible accomplished.  If you follow the project’s progress at all, you’ll see that we’ve recently closed a bunch of website tickets, mostly revolving around the new site.  Barring any major setbacks, we’ll keep pushing forward!

Shameless plug:  If you have ever had an interest in joining a F/OSS group, are looking for something to put on the resume, or just have too much time on your hands, hit me up!  We could always use more help.

Share

Welcome, 2012!

Twenty-eleven has had it’s shares of ups and downs for me.  All in all, it’s been an amazing year.

Just this past week, I’ve managed to really live up my last bits of 2011.  During the last week of December, I went to Longwood Gardens to check out their night time light display.  It was worth dealing with the cold (it got down to 26F that night) to see their winter time display.  The last time I was at Longwood was during the day.  There were some parts that I liked better in the daylight, but the night time was still really impressive.  To be honest, I think Longwood is much more worth visiting at night.  One of the best parts was a fountain display near the entrance into the gardens.  All of the water jets had lights under them that would change color as the music played on.  The jets alone made for an interesting show, but the lights definitely made it much more captivating.  That alone was worth dealing with the night time cold.  To help beat the cold, I spend some time hiding in the Cafe.  It was a little bit expensive, but the food was good and they had beer (how could you go wrong?!).  Though it has a very cafeteria look, the food was definitely not cafeteria quality.  I’m kind of sorry that I didn’t find that sooner.

I’ll skip over the part about the New Year’s Eve party and leave that for the people who are on my Facebook :) .

New Year’s Day was really special this year.  I’ve lived in Philadelphia for most of my life, but I had never seen the Mummers Parade.  Well, that changed.  I spent a few hours of this year hanging out on Broad Street and watching the show.  I don’t think I’ve ever been this fascinated with costumes before — there’s just something about watching a bunch of men dressed like women march around while playing the banjo!  Today was a perfect day for it — it was in the high 4o’s/low 50′s, sunny, and relatively calm.

Now that 2011′s over, I have a lot to look forward to in 2012.  For starters, I’ve been asked to resume my work on 0 A.D.’s website.  I’ve also started work on finding a way to use 0 A.D. in computer science education for high schoolers (more on that later).  The big change this year is the new job, but I’ll leave that for another post.

Happy 2012!

Share

GRUB2: Do Not Want

I’m writing this partially because I hope to save a few people the headache I just had and partially because there are no babies around to eat.

Bg

So, I recently sold a bunch of stuff on eBay (as I previously mentioned).  I then used that money to buy a new server.  For the sake of making this article easier to find, here’s what I bought:

Rackable Systems C2004 with:

  • Intel S5000PSL mother board
  • 2 x 2.66 dual core Xeon processors

The idea was to load up this server with RAM (it’ll take 32Gb) and hard drives (4 x 2Tb SATA II) so that I could replace all of my other servers.  This machine has plenty of horse power for performing my GIS work and for reducing my server count via virtualization.  So, like I do with all servers, I went to install Ubuntu Linux.  This is where the fun ends.

Trying to Set-up the Server

I will spare you some of the details, but I ended up trying to install Ubuntu via a USB thumb drive and a good ‘ol IDE CD-ROM drive.  Every time I ran the installer (for 10.04.3 and 11.04), the install worked.  Everything appeared to be set up without a hitch Great, wonderful, weee!  And then…

Good Luck Booting, Bro

It wouldn’t boot.  No matter how I did my install, with or without LVM, with or without EXT4, with or without sacrificing goats, it just wouldn’t boot.  I would just get a black screen with nothing more than a single blinking cursor.  It wouldn’t respond to keyboard input, yelling, or ritualistic dances.   Booting the install media in rescue mode showed that the logs were empty.  Clearly, I have a hardware problem.

From there, I started monkeying around with BIOS settings.  I made sure my on-board (fake) RAID was disabled, that all of the settings were reasonable, and that I didn’t have anything strange turned on.  No dice.  After days of troubleshooting, I started hitting the forums.  One suggested that if I don’t get a GRUB screen at all that it was a GRUB problem.

Fixing It

The problem was that I was using GRUB2.  For some reason, it just didn’t play nice with my mobo.  So, I booted into rescue mode and executed a shell in my root partition and ran this (Because I was in a rescue terminal, I was already root.  If you’re not root, you must prefix these with sudo):

apt-get remove grub-pc
apt-get install grub
grub-install /dev/sda
update-grub

Magic!  It booted!  Hopefully this saves someone a headache.

Share