Custom Search

Thursday, November 19, 2009

HDFS Permissions issues

If you are running into permissions issue in hdfs installation (Eg: When you try to write to hdfs and that does not seem to work ) - then you may want to *relax* the permissions by the following setting in all hdfs cluster nodes , in hdfs-site.xml ( starting from 0.20 ). Of course - the namednode needs to be restarted for the changes to be effective.

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>


Important - this has a gaping security hole in itself by relaxing the permissions and currently the hdfs team is actively working on enabling better permission based access rules. So - this change is best in the early stages of development to get started and should be revisited once again soon after.



Thursday, July 23, 2009

iBator eclipse plugin

If you are like me and you are into using iBatis eclipse plugin - check out http://ibatis.apache.org/ibator.html. It comes with an eclipse plugin that is quite useful if you are working with the ibatis configuration files.

Friday, July 17, 2009

Handling signals in Java

Signals by definition are specific to the underlying implementation ( read, operating system) .

Java, being a platform independent language , it is often discouraged to write platform specific stuff except for the rarest cases and when there is a clear business justification for the same.

Java does have an undocumented API that talks about signals where we can override the default signal handler for a given signal.  Scenarios where this might be useful and necessary are when we try to release resources ( connections etc.).

Note: When we override default signal handlers, it is important to delegate the behavior to the default handler implementation (Hint: Save the old handler when overriding the same) after we finish the current implementation.


package experiment;

import java.util.HashMap;
import java.util.Map;
import java.util.logging.Logger;

import sun.misc.Signal;
import sun.misc.SignalHandler;

public class CustomSignalHandler implements SignalHandler
{

private static final Logger LOGGER = Logger.getLogger(CustomSignalHandler.class.getName());

private static Map<Signal, SignalHandler> handlers = new HashMap<Signal, SignalHandler>();


@Override
public void handle(Signal signal)
{
LOGGER.info("received " + signal);

// Delegate to the existing handler after handling necessary clean-up.
handlers.get(signal).handle(signal);
}


/**
* Important: This API is not portable but heavily platform dependent as the signal name depends
* on the underying operating system.

*
* @param signalName
* @param signalHandler
*/
public static void delegateHandler(final String signalName,
final SignalHandler signalHandler)
{
try {
Signal signal = new Signal(signalName);
SignalHandler oldhandler = Signal.handle(signal, signalHandler);
} finally {
handlers.put(signal, oldhandler);
}
}


public static void main(String[] args)
{
final int LONG_TIME = 50000;

SignalHandler example = new CustomSignalHandler();
delegateHandler("TERM", example);
delegateHandler("INT", example);
delegateHandler("ABRT", example);

try
{
Thread.sleep(LONG_TIME);
}
catch (InterruptedException ie)
{
ie.printStackTrace();
}
}
}


Advanced Linux Programming

There is a new free e-book on advanced linux programming available here at - http://www.advancedlinuxprogramming.com/alp-folder  .

The chapter list is as follows.

  • Chapter 01 - Advanced Unix Programming with Linux
  • Chapter 02 - Writing Good GNU/Linux Software
  • Chapter 03 - Processes
  • Chapter 04 - Threads
  • Chapter 05 - Interprocess Communication
  • Chapter 06 - Mastering Linux
  • Chapter 07 - The /proc File System
  • Chapter 08 - Linux System Calls
  • Chapter 09 - Inline Assembly Code
  • Chapter 10 - Security
  • Chapter 11 - A Sample GNU/Linux Application
Among the list of chapters - chapter 7 and chapter 8 to some extent are especially useful and well-written where there are some good hidden treasures even for a seasoned linux professional.




Wednesday, July 1, 2009

Segmentation Fault - but no core dump ?

When I was working on a given binary - it gave me a "Segmentation Fault" but no core dump.

Started with the usual suspects .
* Checked the permission of the directory to see if the user has write permission to write the core file. It was ok.
* Checked the /tmp directory just in case.

Then I realized that I was working on bash, that sets core file size to be 0 automatically.

$ ulimit -a
core file size          (blocks, -c)   0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20

..


Reset the same as below

$ ulimit -c unlimited


Now I ran my executable again and yes , this time the core dump is available for processing.




Wednesday, February 4, 2009

Linux AIO Examples

Tim Jones gives an excellent introduction and motivation behind aio here.  All 2.6 kernels have aio as a standard feature now.

Thursday, January 29, 2009

Elastic Block Store

Amazon comes up with another *potential* low-margin - high volume business by announcing Elastic Block Store today.
The most interesting feature is of course the facility to provide block level storage. The S3 service is already extremely popular - thanks to getting away from the relational model (which sometimes could end up being an overkill ) and reducing the headache of IT management for a potential entrepreneur .

EC2 service, no doubt - is much better to create your own instance of an image from scratch and use it. In spite of having free REST requests to the S3 service , the absence of persistence as such on the image instance was a drawback.

EBS provides us with block-level storage volumes that could be attached to an EC2 instance. As opposed to the other tools that has a much stepper adoption curve ( s3 needs some sort of wrapper around REST - the popular being JetS3t ) - this is probably as simple as it could get and hence it might increase the adoption rate compared to the rest.

I have not got the time to compare the pricing of EBS against the rest, but my guess is that people probably would not mind paying up for this given the level of comfort it gives to making the EC2 instances more usable.

Graphite - Visualization Tool

Orbitz, the popular travel planning website, had recently brought some of their (previously proprietary) projects into the public domain by making them open source.

Graphite is a scalable, real-time graph visualization tool, released under the Apache License.

Some of the interesting aspects of the same (courtesy, the FAQ of the software):


  • Written in Python, based on the Django project.
  • The rendering engine is based on the Cairo framework, the same rendering engine used for the rendering of content in the Firefox 3 browser.
  • The input data has to be a numeric time series. (This seems intuitive since graph visualization schemes, differences ought to be based on some quantitative measure eventually). And then, of course - any categorical metric could be mapped to preset numerical values to achieve a similar effect.



Graphite, seems to achieve the scalability by storing the entries in a distributed in-memory database, similar to what LiveJournal implements using the memcached service. And more recently, microsoft has started offering Velocity , a competing product in the same space (with subtle differences though- which I will cover later ).

Ct - Programming language for multi-core processor

With multi-core processors becoming the norm, the responsibility of exploiting the parallelism / improving the performance has increased on the software development rather than the hardware.

Intel has recently come out with a prototype implementation of Ct, a new programming language, for multi-core processors. As per the release notes, the learning curve of Ct is expected to be smoother, as the fundamental language construct seems to be based on the C/C++ programming language, in addition to the language specific features that enable the programmer to refer to parallelism.

A brief introduction to the language construct is available here .

With multi-core systems - it obviously makes more sense to extract / specify data-level parallelism ( + related instructions), as opposed to instruction level parallelism only, to get the best results. New constructs are available in the programming language to specify the same.


  • Mention of a new Generic Vector Type (TVEC), that exist in the managed space. It is important to note that TVECs could be a flat vector or a multi-dimensional vector.

  • Restricted operator overloading on TVEC objects, with the important restriction of allowing those with no side-effects.



As a proof of concept, the examples listed in the tutorial talk about the Black-Scholes option pricing model and the Convolution operator (widely applied in Computer Vision / Image processing applications).


I have not been able to confirm if the implementation + runtime is made available to the public yet. One of the components, The Threading building blocks, has been available as a open source project for sometime though.


This interesting release brings some interesting questions.

  • The last C++ standard (C++03) was written for a single-threaded abstract machine , and threading as yet - is not part of the current C++ standard (current, as supported by the compilers). With fragmented threading libraries across platforms and implementations, portability had always been an issue with threading libraries on C++. But more recently, with Boost Threads providing a nice wrapper over the implementation-specific thread libraries - it is becoming less of a concern. And there is a very good chance that most of these primitives / APIs would be used in the upcoming C++0x standard as well. Given that, the standardization process of introducing thread support into the languages is a little bit late and C++ look-alikes specific to multi-core processors, pushed by the architecture vendor themselves, what would the first choice of technology developers to implement high frequency applications ?

  • Functions with no-side effects, List Comprehension are all first class citizens, welcome in the Functional Programming world. More specifically, recently , I am fascinated with the Erlang Programming Language with native constructs supporting concurrency (no shared memory, thanks) and based on message passing. So - can the job of extracting better performance from multi-core processors be split between providing a robust interpreter / compiler for the functional programming languages and the functional programming language developer ?



We need to wait and see the way things take shape regarding the above mentioned scenarios.

Tuesday, January 27, 2009

JMS (ActiveMQ) using Spring

For a recent project - I wanted to get started with JMS implementations and finally settled on ActiveMQ . I chose the Spring framework because of the range of integration options it gives us with the other parts of the stack.

Here is the sample code fragment using the same. Pre-requisites: Download Apache ActiveMQ 5.2.0 and Spring JMS 2.5.6.A (use ivy from the spring repository to grab the same).

Launch the activemq binary , before running the program below. The binary is usually available in $ACTIVEMQ_HOME/bin/activemq. The binary launches the tcp listening endpoint using the openwire protocol. You will see a line similar to below

INFO  TransportServerThreadSupport   - Listening for connections at: <b>tcp://hostname:61616</b>
INFO  TransportConnector             - Connector <b>openwire</b> Started


package mymq;

import java.io.Serializable;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

import javax.jms.ConnectionFactory;
import javax.jms.JMSException;
import javax.jms.Message;
import javax.jms.Session;

import org.apache.activemq.ActiveMQConnectionFactory;
import org.apache.activemq.command.ActiveMQObjectMessage;
import org.springframework.jms.JmsException;
import org.springframework.jms.core.JmsTemplate;
import org.springframework.jms.core.MessageCreator;

public class Producer {

public static class FlyWeight implements Serializable {

public FlyWeight(String _msg) {
msg = _msg;
}

private String msg;

@Override
public String toString() {
return msg;
}
}

/**
* @param args
* @throws JmsException
*/
public static void main(String[] args) {
Producer prod = new Producer();
prod.startProducer();
prod.startConsumer();
}

public void startProducer() {
service.submit(new Runnable() {

public void run() {
try {
JmsTemplate template = new JmsTemplate(getConnectionFactory());
template.afterPropertiesSet();
final DateFormat fmt = new SimpleDateFormat("HH:mm:ss");
while (true) {
Thread.sleep(1000 * 2);
template.send(QUEUE_NAME, new MessageCreator() {

@Override
public Message createMessage(Session session) throws JMSException {
ActiveMQObjectMessage msg = new ActiveMQObjectMessage();
msg.setObject(new FlyWeight(fmt.format(new Date())));
return msg;
}

});

}
} catch (Exception ex) {

}
}
});
}

public void startConsumer() {
service.submit(new Runnable() {
public void run() {
try {
JmsTemplate template = new JmsTemplate(getConnectionFactory());
template.afterPropertiesSet();
while (true) {
Thread.sleep(1000 * 2);
Message msg = template.receive(QUEUE_NAME);
if (msg instanceof ActiveMQObjectMessage) {
ActiveMQObjectMessage text = (ActiveMQObjectMessage) msg;
System.out.println(text.getObject());
} else {
System.err.println("Message type invalid " + msg.getClass());
}
}
} catch (Exception ex) {

}
}
});
}

static ConnectionFactory getConnectionFactory() {
ActiveMQConnectionFactory factory = new ActiveMQConnectionFactory();
// Default port.
// Important: The script 'activemq' must be launched for this program to
// work
// By default - activemq binds a tcp listener (openwire protocol)
// listening to requests at the same.
factory.setBrokerURL("tcp://localhost:61616");
return factory;
}

static final String QUEUE_NAME = "MyQueue";

static ExecutorService service = Executors.newFixedThreadPool(2);
}