Feb 09 2011

Los Angeles Hadoop Users Group- LA-HUG

Published by under Hadoop

LA now has its own HUG. The first meetup will be held on Wednesday 2/9/2011. This is a great opportunity for anyone in the Los Angeles area with interest in Hadoop and related technologies to discuss and meet.

The first talk is:  “Operationalizing Hadoop” with Charles Zedlewski (Cloudera’s VP Product)

http://www.meetup.com/LA-HUG/

No responses yet

Nov 16 2010

fatal: The remote end hung up unexpectedly

Published by under Miscellaneous

I committed changes to my GIT project, tried to push them to the remote server (git push) and got the following cryptic error message:

fatal: The remote end hung up unexpectedly

The GitFaq states that:

Git push fails with “fatal: The remote end hung up unexpectedly”?
If, when attempting git push, you get a message that says:
fatal: The remote end hung up unexpectedly

There are a couple of reasons for that, but the most common is that authorization failed. You might be using a git:// URL to push, which has no authorization whatsoever and hence has write access disabled by default. Or you might be using an ssh URL, but either your public key was not installed correctly, or your account does not have write access to that repository/branch.

I used “git config –list” to review my project configuration

core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
core.ignorecase=true
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
remote.origin.url=git://git.some-domain.com/my-project
branch.master.remote=origin
branch.master.merge=refs/heads/master
branch.v2.remote=origin
branch.v2.merge=refs/heads/v2
gui.geometry=1374×727+34+82 301 201

The culprit here is the “remote.origin.url” property, it’s pointing to a read-only repository. We can change this using “git config –edit”. We want to change:

remote.origin.url=git://git.some-domain.com/my-project

to

remote.origin.url=git@git.some-domain.com:/my-project

Please note that the specific user “git” is specific to my setup and your GIT setup/configuration is most likely different. After this change I was able to successfully push my changes to the remote server.

The reason to why I got the read-only version of the project was because I used “git clone git://git.some-domain.com/my-project” when I should have used “git clone git@git.some-domain.com:/my-project”

6 responses so far

Oct 17 2010

Hadoop World

Published by under Miscellaneous

Hadoop_logo.svg

I just came back from the Hadoop World conference in New York and I have to say that it was quite exciting. Processing huge amounts of data used to be a problem for just a few companies like Google, Yahoo, Facebook and a few others, but has now become a problem for many. The conference topics were interesting and the training held by Cloudera was really good. My personal recommendation is to get up to speed on Hadoop and related technologies e.g. HBase, Hive, Pig etc. quickly since I think that the ever growing data sizes will soon make these tools commonplace. It takes time to learn how think at scale and to use these tools properly. I’ve now seen how “big data” has grown to such sizes that not even big clustered databases like Oracle RAC provide the ability to quickly process and extract information for our needs. Hadoop is not a universal tool for big data problems, but for a certain set of problems it’s quite powerful and provides almost linear performance as you scale up your compute cluster. Cloudera has excellent videos for Hadoop here: http://www.cloudera.com/resources/?media=Video to get you started. Tom White’s “Hadoop: The Definitive Guide (2nd Edition)” is excellent and I can highly recommend it.

No responses yet

Sep 17 2010

Unix/Linux Sort Multiple Columns, Tab Delimited and Reverse Sort Order

Published by under Unix/Linux

Sorting a tab delimited file using the Unix sort command is easy once you know which parameters to use. An advanced file sort can get difficult to define if it has multiple columns, uses tab characters as column separators, uses reverse sort order on some columns, and where you want the columns sorted in non-sequential order.

Assume that we have the following file where each column is separated by a [TAB] character:

Group-ID   Category-ID   Text        Frequency
----------------------------------------------
200        1000          oranges     10
200        900           bananas     5
200        1000          pears       8
200        1000          lemons      10
200        900           figs        4
190        700           grapes      17

I’d like to have this file sorted by these columns and in this specific order. I want column 4 sorted before column 3, and column 4 to be sorted in reverse order:

  • Group ID (integer)
  • Category ID (integer)
  • Frequency “sorted in reverse order” (integer)
  • Text (alpha-numeric)

I want the file sorted this way:

Group-ID   Category-ID   Text        Frequency
----------------------------------------------
190        700           grapes      17
200        900           bananas     5
200        900           figs        4
200        1000          lemons      10
200        1000          oranges     10
200        1000          pears       8

To sort the file that way we have to define the sort parameters like this:

sort -t $'t' -k 1n,1 -k 2n,2 -k4rn,4 -k3,3 <my-file>

The first thing we need to do is to tell sort to use TAB as a column separator (column separated or delimited) which we can do using:

sort -t $'t' <my-file>

If our input file was comma separated we could have used:

sort -t "," <my-file>

The next step is define that we want the file sorted by columns 1, 2, 4 and 3 and in this particular order. The key argument “-k” allows us to do this. The tricky part is that you have to define the column index twice to limit the sort to any given column, e.g. like this “-k 1,1″. If you only specify it once like this “-k 1″ you’re telling Unix “sort” to sort the file from column 1 and until the end of the line which is not what we want. If you want to sort column 1 and 2 together you’d use “-k 1,2″.  To tell sort to sort multiple columns we have to define the key argument “-k” multiple times. The sort arguments required to sort our file in column order 1, 2, 4 and 3 will therefore look like this:

sort -t $'t' -k 1,1 -k 2,2 -k 4,4 -k 3,3 <my-file>

We however want the 4th column sorted in reverse order. We instruct sort to do by changing the argument from “-k 4,4″ to “-k 4r,4″. The “r” option reverses the sort order for that column only. There’s only one problem left to solve and that is that sort by default will interpret numbers as text and will sort e.g.  the number 10 ahead of 2. We solve this by adding the “n” option to tell “sort” to sort a column using its numerical values e.g. “-k 1n,1″. Note that the “n” option is only attached to the first number to the left of the comma. Since the 4th column is sorted in both reversed order and using numerical values we can combine the options like this “-k 4rn,4″

So by adding all of these options together with end up with:

sort -t $'t' -k 1n,1 -k 2n,2 -k 4rn,4 -k 3,3 <my-file>

I hope someone will find this useful. I tested this solution on both Linux and OS X. The documentation for the Unix sort command can be found using your man command “man sort” and “info sort”.

10 responses so far

Jul 16 2010

ls full path

Published by under Unix/Linux

How do you get the Unix command ls to show you the full path? Unfortunately there’s no argument for ls that will do this directly.
However this will work fine and give you what you want.

ls -d $PWD/*

or

ls -ld $PWD/*

No responses yet

Jul 07 2010

Window Stuck Under Toolbar in OS X

Published by under OS X / Apple OS

Sometimes a window can get stuck under the top toolbar in OS X. This often happens when I use Citrix in OS X to run Windows applications. When this happens it’s not possible to grab the window nor  to close it. A simple solution for this is to press [fn] [shift] [F2] which will move the application window a bit which allows you to grab it.

No responses yet

Apr 14 2010

fatal: git checkout: updating paths is incompatible with switching branches.

Published by under GIT

Using GIT I tried to pull down a new remote branch using:

git checkout --track -b my-branch-name origin/my-branch-name

When I did this I got this error message:

fatal: git checkout: updating paths is incompatible with switching branches.
Did you intend to checkout 'origin/my-branch-name' which can not be resolved as commit?

This error message was a tad confusing. The solution in my case was simple though, apparently you can’t switch to a different remote branch if your local master is not up-to-date with the remote origin/master so performing:

git pull

resolved the issue and after this I was able to successfully pull down the remote branch.

3 responses so far

Mar 04 2010

DbVisualizer auto commit problem

Published by under JDBC

I had some issues with DbVisualizer and auto commit. I wanted to be able to turn it off from the SQL commander. The official documentation states that you can do this using:

The Auto Commit setting is enabled by default and can be adjusted in the Connection Properties. You may also adjust the  auto commit state for the SQL editor you are using in the SQL Commander with the following command:

@set autocommit true/false

Unfortunately this didn’t work for  me in either 6.5.12 or 7.04 (I’m using OS X and Java 6) against an Oracle 10g database. I get an error alert stating “/application/set autocommit false (No such file or directory)”
I was finally able to figure out that you can get it to work using:

@set autocommit off/on

I’m not sure if this is a problem that only occur on OS X.

No responses yet

Feb 04 2010

GIT Fatal You Have not Concluded Your Merge MERGE_HEAD Exists

Published by under GIT

fatal: You have not concluded your merge. (MERGE_HEAD exists)

I got this message because when I performed a “git pull”. I searched for a solution for this problem on the Internet and it wasn’t until I found this post that I was able to resolve this issue. The problem was that I:

  1. Performed a “git pull” and the automatic merge failed and I ended up with merge conflicts
  2. I resolved the merge conflicts and added the resolved files back using “git add”
  3. Performed a new “git pull” and got the “Fatal You Have not Concluded Your Merge MERGE_HEAD Exists” error

Apparently step 3 overrides MERGE_HEAD, starting a new merge with a dirty index. According to the post this is a common mistake made by programmers that are used to version control systems where the user follows an “update” and “commit” work flow.

So how do we resolve this issue? What worked for me was to follow the instructions for how to “Undo a merge or pull inside a dirty work tree” found here.

  1. I used “git reset –merge ORIG_HEAD”
  2. I resolved the merge conflicts again and added the resolved files back using “git add”
  3. I was then finally able to “push” my changes!

According to the documentation if you run a “git reset –hard ORIG_HEAD” it will let you go back to where you were before you were trying to commit your changes, however you will lose local changes. Most likely not what you want to do. Using “git reset –merge” will let you keep your local changes. You will however have to re-resolve your conflicting merge files.

Some additional information on this topic can be found here.

8 responses so far

Dec 04 2009

SQL*Loader-522: lfiopn failed for file (loader.log)

Published by under Oracle

I used the Oracle SQL Loader to push some data into a table and got the following error: SQL*Loader-522: lfiopn failed for file (loader.log)
This somewhat cryptic error message turned out to be that Oracle SQL Loader didn’t have write permissions in the work directory i.e. in the directory where I executed the sqlldr command. Once I fixed the directory permission everything worked just fine.

No responses yet

Older Entries »