Hi, after some holidays i am back. this post is about how to use spring-data technology with apache hadoop, or how to write a map reduce task using spring. The idea is to provide a way on how to focus in the most important part about apache hadoop, the map reduce writing task.

So, lets begin with the project, the most impatient (like myself) can find the sources project here.

The example is written in a maven style, so we can see the dependencies in the pom.xml file:

4.0.0
es.aironman.samples
my-spring-data-mapreduce
0.0.1-SNAPSHOT
my sample about how to code a map reduce task for hadoop using spring-data

1.0.3
1.6.1
3.1.2.RELEASE
1.0.0.RELEASE
UTF-8

commons-lang
commons-lang
2.6

org.springframework
spring-beans
${spring.version}

org.springframework
spring-core
${spring.version}

org.springframework
spring-context-support
${spring.version}

org.springframework
spring-context
${spring.version}

cglib
cglib
2.2.2

org.springframework.data
spring-data-hadoop
${spring.data.hadoop.version}

org.apache.hadoop
hadoop-core
${apache.hadoop.version}

org.slf4j
slf4j-api
${slf4j.version}

org.slf4j
slf4j-log4j12
${slf4j.version}

log4j
log4j
1.2.16

junit
junit
4.9
test

org.mockito
mockito-core
1.8.5
test

my-spring-data-mapreduce

org.apache.maven.plugins
maven-compiler-plugin
2.3.2

1.6
1.6

org.apache.maven.plugins
maven-assembly-plugin
2.2.2

src/main/assembly/assembly.xml

org.apache.maven.plugins
maven-jar-plugin
2.3.1

true
lib/
net.petrikainulainen.spring.data.apachehadoop.Main

org.apache.maven.plugins
maven-site-plugin
3.0

org.codehaus.mojo
cobertura-maven-plugin
2.5.1

If you see this file, you can say me that i am not using the latest versions of dependencies!, with time i promise to update this or maybe you can send me a pull/push request to this github project ;) .

In applicationContext.xml file we can see how spring-data project declare which map reduce job is going to be executed.

fs.default.name=${fs.default.name}
mapred.job.tracker=${mapred.job.tracker}

There is an application.properties file with necessary config data, like where is the HDFS (Hadoop data file system), where is the hadoop tracker listening, the input data path with necessary data to be filtered and the output data path with the result. Please, do not forget to erase the output data directory if you launch the map reduce task more that once.

application.properties

fs.default.name=hdfs://localhost:9000
mapred.job.tracker=localhost:9001

input.path=/input/
output.path=/output/

maybe you are going to need to change localhost with the hadoop ip address, check it out!

Now the map reduce classes, they are the same of another project i have talked about in this blog, so i am not going I will not delve deeper into this subject.

I think the code is already well documented, so, the mapper class is:


package es.aironman.samples.spring.data.hadoop;

import java.io.IOException;

import org.apache.commons.lang.math.NumberUtils;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class DominiosRegistradorMapper extends Mapper {

private static final String SEPARATOR = ";";

@Override
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {

final String[] values = value.toString().split(SEPARATOR);
String agent ;
String totalDomains;
for (int i=0;i<values.length;i++){

agent = format(values[1]);
totalDomains = format(values[2]);

if (NumberUtils.isNumber(totalDomains.toString() ) ){
context.write(new Text(agent), new DoubleWritable(NumberUtils.toDouble(totalDomains)));
}

}//del for
}
private String format(String value) {
return value.trim();
}

}

You may see that data file has this format:

id ; Agente Registrador ; Total dominios;
1 ; 1&1 Internet ; 382.972;
36 ; WEIS CONSULTING ; 4.154;
71 ; MESH DIGITAL LIMITED ; 910;

The idea of the mapper is to split every line by “;”, get every Agente Registrador (agent recorder) and each Total dominios (total domains) and write it the hadoop context. This is a very simple hadoop task, in this phase you can choose which agente recorder want to write to context, for simplicity, i choose to write every agent with its total domain to the context.

Now the reducer class:


package es.aironman.samples.spring.data.hadoop;

import java.io.IOException;
import java.text.DecimalFormat;

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

/***
*
* This Reducer operation consists in keep the largest value of total of registered domains
* @author aironman
*
*/
public class DominiosRegistradorReducer extends Reducer {

private final DecimalFormat decimalFormat = new DecimalFormat("#.###");

public void reduce(Text key, Iterable totalDominiosValues, Context context)
throws IOException, InterruptedException {
double _maxTotalDomains = 0.0f;
for (DoubleWritable totalDominiosValue : totalDominiosValues) {
double _total = totalDominiosValue.get() ;

_maxTotalDomains = Math.max(_maxTotalDomains, _total);
}
context.write(key, new Text(decimalFormat.format(_maxTotalDomains)));

}

}

As you can guess, in this phase i am keeping from hadoop context only the max total domais of each agent. Maybe you want to calculate the minimal or the average. For that, you are going to write a custom writable, but that is beyond of this post. Keep it post for future updates.

Now that´s it!, you can assembly the jar with this command:

mvm clean assembly:assembly

If everything is ok, you can see this output:

SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[INFO] Scanning for projects…
[INFO]
[INFO] ————————————————————————
[INFO] Building my-spring-data-mapreduce 0.0.1-SNAPSHOT
[INFO] ————————————————————————
[INFO]
[INFO] — maven-clean-plugin:2.4.1:clean (default-clean) @ my-spring-data-mapreduce —
[INFO] Deleting /Users/aironman/Documents/ws-spring-data-hadoop/my-spring-data-mapreduce/target
[INFO]
[INFO] ————————————————————————
[INFO] Building my-spring-data-mapreduce 0.0.1-SNAPSHOT
[INFO] ————————————————————————
[INFO]
[INFO] >>> maven-assembly-plugin:2.2.2:assembly (default-cli) @ my-spring-data-mapreduce >>>
[INFO]
[INFO] — maven-resources-plugin:2.5:resources (default-resources) @ my-spring-data-mapreduce —
[debug] execute contextualize
[INFO] Using ‘UTF-8′ encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO]
[INFO] — maven-compiler-plugin:2.3.2:compile (default-compile) @ my-spring-data-mapreduce —
[INFO] Compiling 3 source files to /Users/aironman/Documents/ws-spring-data-hadoop/my-spring-data-mapreduce/target/classes
[INFO]
[INFO] — maven-resources-plugin:2.5:testResources (default-testResources) @ my-spring-data-mapreduce —
[debug] execute contextualize
[INFO] Using ‘UTF-8′ encoding to copy filtered resources.
[INFO] Copying 0 resource
[INFO]
[INFO] — maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ my-spring-data-mapreduce —
[INFO] Nothing to compile – all classes are up to date
[INFO]
[INFO] — maven-surefire-plugin:2.10:test (default-test) @ my-spring-data-mapreduce —
[INFO] Surefire report directory: /Users/aironman/Documents/ws-spring-data-hadoop/my-spring-data-mapreduce/target/surefire-reports

——————————————————-
T E S T S
——————————————————-

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO]
[INFO] — maven-jar-plugin:2.3.1:jar (default-jar) @ my-spring-data-mapreduce —
[INFO] Building jar: /Users/aironman/Documents/ws-spring-data-hadoop/my-spring-data-mapreduce/target/my-spring-data-mapreduce.jar
[INFO]
[INFO] <<< maven-assembly-plugin:2.2.2:assembly (default-cli) @ my-spring-data-mapreduce <<<
[INFO]
[INFO] — maven-assembly-plugin:2.2.2:assembly (default-cli) @ my-spring-data-mapreduce —
[INFO] Reading assembly descriptor: src/main/assembly/assembly.xml
[INFO] Building zip: /Users/aironman/Documents/ws-spring-data-hadoop/my-spring-data-mapreduce/target/my-spring-data-mapreduce-bin.zip
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 4.245s
[INFO] Finished at: Wed Aug 13 11:59:42 CEST 2014
[INFO] Final Memory: 16M/315M
[INFO] ————————————————————————

The assembly phase will provide you a zip file, unzip it into your hadoop cluster and launch startup.sh script file.

Enjoy!

Update

this is the link from spring-data project

Ante todo, perdonad por esto, pero es que estoy que trino. A ver, señores que trabajan para empresas de trabajo temporal y allegados. No voy a hacer ninguna prueba más de esas que dicen que miden el conocimiento real de programación, no son realistas, se basan normalmente en los típicos programitas que hacíamos en la universidad para aprender a programar. No me llaman la atención, me aburren y lo peor es que no mide nada de nada.

Para demostrar mis habilidades tengo la cuenta en github, donde tengo el código de lo que se hacer y mas me gusta, a saber, programar sistemas del tipo backend que corren en servidores que funcionan 24/7/365. Eso que es? pues pensad en todas las apps que hay por ahi, esas que nos instalamos en nuestros android o iphone, esas apps necesitan comunicarse con, al menos, un servidor para realizar su función. Por lo menos las aplicaciones mas complejas, por que no todas necesitan comunicarse con el exterior. Lo que se hacer, mejor o peor que otros, es programar aplicaciones para el servidor/es de aplicaciones.

También se programar aplicaciones iOS, en teoría sabiendo java podría programar para android, pero tengo un iphone4, comprendo el código html5/css3/jquery y me encanta el movimiento open source, por lo que seré mas receptivo a escuchar ofertas si voy a trabajar con código abierto, aunque no soy un taliban. Si hay algo de fuente cerrada que es mejor que la fuente abierta, lo reconozco y lo uso, de hecho, para el día a día uso un macbook pro retina de finales del 2013, y también uso ubuntu o redhat cuando quiero trabajar con algo relacionado con el bigdata, como programar tareas map reduce sobre apache hadoop y/o apache spark.

 

 

 

The idea behind of this project is to provide an example for a secured web service connected using REST architecture style with command pattern and composite command pattern, to a mongodb instance and a mysql instance too, best of both worlds.

There are at least two ways in java world to connect to a mongo db instance, you can choose spring-data-mongodb project, or Morphia, both of them are very easy to use, you only need to create an interface that extends something and that´s it!.

Using morphia way, you have to declare an interface like this:

package com.aironman.sample.dao;

import org.bson.types.ObjectId;

import com.aironman.sample.dao.model.Employee;

/**
* Date: 12 junio 2014
*
* @author Konrad Malawski
* @author Alonso Isidoro
*/
public interface EmployeeDao extends org.mongodb.morphia.dao.DAO<Employee, ObjectId> {
}

And its implementation file:

package com.aironman.sample.dao;

 

import org.bson.types.ObjectId;

import org.mongodb.morphia.Morphia;

import org.mongodb.morphia.dao.BasicDAO;

import com.aironman.sample.dao.EmployeeDao;

import com.aironman.sample.dao.model.Employee;

import com.mongodb.Mongo;

 

 

/**

 * Date: 12 junio 2014

 *

 * @author Konrad Malawski

 * @author Alonso Isidoro

 */

public class EmployeeDaoMorphiaImpl extends BasicDAO<Employee, ObjectId> implements EmployeeDao {

 

public EmployeeDaoMorphiaImpl(Morphia morphia,Mongo mongo,String dbName) {

super(mongo, morphia,dbName);

}

}

Super easy!

What about if you want to use spring-data-mongo project? an interface and that is all!

package com.aironman.sample.mongo.repository;

import org.springframework.data.mongodb.repository.MongoRepository;

import com.aironman.sample.mongo.documents.Role;

public interface RoleRepository extends MongoRepository<Role, String> {
}

 

And what about jpa?

package com.aironman.sample.dao;

 

import com.aironman.sample.dao.model.User;

import org.springframework.data.repository.CrudRepository;

 

/**

 * User: aironman

 * Date: 4 de junio del 2014

 */

public interface UserDao extends CrudRepository<User,Long> {

}

The most important using this nonsql technology is to design wisely the mongo db document, which is in JSON format, don’t forget about it, and depending of the wrapper technology chosen, spring or morphia, the way to build one differs. For example, the morphia document:

Employee class, modeled with morphia:

@Entity(value = “employees”, noClassnameStored = true)

public class Employee {

 

    @Id

    private ObjectId id;

 

    private String firstName;

    private String lastName; // value types are automatically persisted

 

    Long salary; // only non-null values are stored

 

    @Embedded

    Address address;

 

    @Reference

    Employee       manager; // refs are stored*, and loaded automatically

    @Reference

    List<Employee> underlings; // interfaces are supported

 

//    @Serialized

//    EncryptedReviews enchryptedReviews; // stored in one binary field

 

    @Property(“started”)

    Date startDate; //fields can be renamed

    @Property(“left”)

    Date endDate;

 

    @Indexed

    boolean active = false; //fields can be indexed for better performance

 

    @NotSaved

    String readButNotStored; //fields can read, but not saved

 

    @Transient

    int notStored; //fields can be ignored (load/save)

    transient boolean stored = true; //not @Transient, will be ignored by Serialization/GWT for example.

getters and setters

}

 

Now a spring data document class:

@Document

public class Role {

 

@Id

private String id;

 

public Role() {

super();

}

 

public Role(String id) {

super();

this.setId(id);

}

getters, setters, hashCode and equals method…

}

What differences are? the annotation , org.springframework.data.mongodb.core.mapping.Document for the spring-data and

org.mongodb.morphia.annotations.Entity for Morphia, that`s all.

the used jpa pojo in this example is the User class, with a different @Entity annotation.

@Entity

public class User {

 

    @Id

    @GeneratedValue

    private Long id;

 

    private String firstName;

    private String lastName;

    private String email;

getters and setters

 }

That is the difficult part, enjoy with the rest!

Alonso

Links

http://projects.spring.io/spring-data-mongodb/

https://github.com/mongodb/morphia

The source code is located in https://github.com/alonsoir/mycxf-mongodb-morphia-mysql-sample

 

 

Last week i did an interview with a big video game company, King, probably the casual video games company. The point is they want somebody with strong backend skills, so here i am!, i thought! i have some skills with the back end layer, i know very well about the spring framework, orm, sql, nosql, performance, multithreading, asynchronous tasks, big data technology, etc… that was my thoughts, i have an opportunity, but they demand know how about Pico container. Bad luck, Alonso…

Well, now i know that i need to know something about pico container, so stay tunned with next post related with this interesting technology.

PD

currently i am still available to contract.

This is a draft about my next task, periodically ask Twitter with twitter4j api relevant things, for example, my timeline and trending topics to begin with. I am going to create a web service and then integrate that functionality with a topic rabbitmq  using  spring integration and a websocket managed by a controller, so I can display the relevant info in real time in a browser.

Stay tuned!

update 21 May 2014

Twitter have a very restricted policy about using its api, an usual matter, but i consider it very restricted because i am getting some weird exceptions. A few days before i did not get any of this, but now i think i am banned! grrrrr

 

Failed to delete status: 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync.

message – Could not authenticate you

code – 32

 

401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync.

message – Could not authenticate you

code – 32

 

Relevant discussions can be found on the Internet at:

http://www.google.co.jp/search?q=c8fb4e9c or

http://www.google.co.jp/search?q=7bffc794

TwitterException{exceptionCode=[c8fb4e9c-7bffc794], statusCode=401, message=Could not authenticate you, code=32, retryAfter=-1, rateLimitStatus=null, version=3.0.6-SNAPSHOT}

at twitter4j.HttpClientImpl.request(HttpClientImpl.java:157)

at twitter4j.HttpClientWrapper.request(HttpClientWrapper.java:58)

at twitter4j.HttpClientWrapper.get(HttpClientWrapper.java:86)

at twitter4j.TwitterImpl.get(TwitterImpl.java:2001)

at twitter4j.TwitterImpl.showUser(TwitterImpl.java:886)

at twitter4j.examples.user.ShowUser.main(ShowUser.java:42)

 

Esto es un borrador acerca de mi próxima tarea, preguntar periódicamente a Twitter con la api twitter4j cosas relevantes, para empezar, mi timeline y los trending topics, para empezar. La forma de preguntar sera creando un servicio web con esa funcionalidad y luego integrarlo con un topic rabbitmq mediante spring integration a un controlador gestionado por websockets, así podré mostrar la info relevante en tiempo real en un navegador.

 

Finally i can continue with this post, a sample with a big data technology, for example, a java map reduce task running on apache hadoop.

First at all, you need to install hadoop, and i have to say that it is not trivial, depending of your SO, you may install it with apt, yum, brew, etc… or like i did, downloading a vmware image with all necessary stuff. There are some providers, like Cloudera or IBM BigInsights. I choose the last one because of i learn big data concepts  in bigdatauniversity.com, an iniciative from IBM.

Once downloaded the big insights vmware image, you can launch the boot, login with biadmin/biadmin and then click on Start BigInsights button, after few minutes, hadoop will be up and running. Go to http://bivm:8080/data/html/index.html#redirect-welcome in the firefox big insights and you can see it.

Once you have a hadoop cluster to play, it is time to code something, but first, you need to analyze the text, i put a little text, but real data are terabytes, hexabytes or more data with this format, thousands of billions lines with this format:

 id ; Agente Registrador   ; Total dominios;

 1  ; 1&1 Internet    ; 382.972;

 36 ; WEIS CONSULTING    ; 4.154;

 71 ; MESH DIGITAL LIMITED ; 910;

This is the mapper, the purpose of the mapper is to create a list with keys and values.

 

public class DominiosRegistradorMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {

 

privatestaticfinal String SEPARATOR = “;”;

 

@Override

public void map(LongWritable key, Text value, Context context) throws IOException,

InterruptedException { 

final String[] values = value.toString().split(SEPARATOR);

for (int i=0;i<values.length;i++){

/**

* id ; Agente Registrador   ; Total dominios;

* 1  ; 1&1 Internet    ; 382.972;

* 36 ; WEIS CONSULTING    ; 4.154;

* 71 ; MESH DIGITAL LIMITED ; 910;

* */

final String agente = format(values[1]);

final String totalDominios = format(values[2]); 

if (NumberUtils.isNumber(totalDominios.toString() ) ) 

context.write(new Text(agente), new DoubleWritable(NumberUtils.toDouble(totalDominios)));

 

}//del for

}

private String format(String value) {

return value.trim();

}

}

 

This is the reducer:

public class DominiosRegistradorReducer extends Reducer<Text, DoubleWritable, Text, Text> {

 

private final DecimalFormat decimalFormat = new DecimalFormat(“#.###”);

 

public void reduce(Text key, Iterable<DoubleWritable> totalDominiosValues, Context context)

throws IOException, InterruptedException {

double_maxtotalDominios = 0.0f;

 

for (DoubleWritable totalDominiosValue : totalDominiosValues) {

double_total = totalDominiosValue.get() ;

 

_maxtotalDominios = Math.max(_maxtotalDominios, _total);

}

// i need to keep with the agent which largest number of domains

context.write(key, new Text(decimalFormat.format(_maxtotalDominios)));

}

}

This is the main class:

publicclass App extends Configured implements Tool 

{

@Override

public int run(String[] args) throws Exception {

 

if (args.length != 2) {

System.err.println(“DominiosRegistradorManager required params: {input file} {output dir}”);

System.exit(-1);

}

 

deleteOutputFileIfExists(args);

 

final Job job = newJob(getConf(),“DominiosRegistradorManager”);

job.setJarByClass(App.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

 

job.setMapperClass(DominiosRegistradorMapper.class);

job.setReducerClass(DominiosRegistradorReducer.class);

 

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(DoubleWritable.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

 

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

 

job.waitForCompletion(true);

 

return 0;

}

 

private void deleteOutputFileIfExists(String[] args) throws IOException {

final Path output = new Path(args[1]);

FileSystem.get(output.toUri(), getConf()).delete(output, true);

}

 

public static void main(String[] args) throws Exception {

ToolRunner.run(new App(), args);

}

}

Now you have a glimpse of the code, you can download it and import to your eclipse. Once imported, you need to create a jar. With that jar and the cluster online, you are almost ready to launch the code, but probably you need to import the huge text file with data from http://datos.gob.es, download it and export to your cluster. I recommend to use the browser for that, click on Start BigInsigths if you don’t yet did , open Biginsights web console,  click Files, on the left you can see an HDFS tree, that is the hadoop file system, expand it until /Users/biadmin/, create a directory, for example, inputMR, so you can see /Users/biadmin/inputMR in your tree. You must upload the example file to that directory. You need to create outputMR directory as well

[biadmin@bivm ~]$ hadoop jar nameOfYourJar.jar /user/biadmin/inputMR /user/biadmin/outputMR
14/05/12 12:09:24 INFO input.FileInputFormat: Total input paths to process : 2
14/05/12 12:09:24 WARN snappy.LoadSnappy: Snappy native library is available
14/05/12 12:09:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/05/12 12:09:24 INFO snappy.LoadSnappy: Snappy native library loaded
14/05/12 12:09:24 INFO mapred.JobClient: Running job: job_201405121126_0059
14/05/12 12:09:25 INFO mapred.JobClient: map 0% reduce 0%
14/05/12 12:09:31 INFO mapred.JobClient: map 50% reduce 0%
14/05/12 12:09:34 INFO mapred.JobClient: map 100% reduce 0%
14/05/12 12:09:43 INFO mapred.JobClient: map 100% reduce 100%
14/05/12 12:09:44 INFO mapred.JobClient: Job complete: job_201405121126_0059
14/05/12 12:09:44 INFO mapred.JobClient: Counters: 29
14/05/12 12:09:44 INFO mapred.JobClient: Job Counters
14/05/12 12:09:44 INFO mapred.JobClient: Data-local map tasks=2
14/05/12 12:09:44 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8827
14/05/12 12:09:44 INFO mapred.JobClient: Launched map tasks=2
14/05/12 12:09:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/05/12 12:09:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/05/12 12:09:44 INFO mapred.JobClient: Launched reduce tasks=1
14/05/12 12:09:44 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10952
14/05/12 12:09:44 INFO mapred.JobClient: File Input Format Counters
14/05/12 12:09:44 INFO mapred.JobClient: Bytes Read=197
14/05/12 12:09:44 INFO mapred.JobClient: File Output Format Counters
14/05/12 12:09:44 INFO mapred.JobClient: Bytes Written=19
14/05/12 12:09:44 INFO mapred.JobClient: FileSystemCounters
14/05/12 12:09:44 INFO mapred.JobClient: HDFS_BYTES_READ=413
14/05/12 12:09:44 INFO mapred.JobClient: FILE_BYTES_WRITTEN=76101
14/05/12 12:09:44 INFO mapred.JobClient: FILE_BYTES_READ=50
14/05/12 12:09:44 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=19
14/05/12 12:09:44 INFO mapred.JobClient: Map-Reduce Framework
14/05/12 12:09:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3867070464
14/05/12 12:09:44 INFO mapred.JobClient: Reduce input groups=2
14/05/12 12:09:44 INFO mapred.JobClient: Combine output records=4
14/05/12 12:09:44 INFO mapred.JobClient: Map output records=4
14/05/12 12:09:44 INFO mapred.JobClient: CPU time spent (ms)=1960
14/05/12 12:09:44 INFO mapred.JobClient: Map input records=2
14/05/12 12:09:44 INFO mapred.JobClient: Reduce shuffle bytes=56
14/05/12 12:09:44 INFO mapred.JobClient: Combine input records=4
14/05/12 12:09:44 INFO mapred.JobClient: Spilled Records=8
14/05/12 12:09:44 INFO mapred.JobClient: SPLIT_RAW_BYTES=216
14/05/12 12:09:44 INFO mapred.JobClient: Map output bytes=36
14/05/12 12:09:44 INFO mapred.JobClient: Reduce input records=4
14/05/12 12:09:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=697741312
14/05/12 12:09:44 INFO mapred.JobClient: Total committed heap usage (bytes)=746494976
14/05/12 12:09:44 INFO mapred.JobClient: Reduce output records=2
14/05/12 12:09:44 INFO mapred.JobClient: Map output materialized bytes=56
[biadmin@bivm ~]$

If you see something like this, congrats! your map reduce task is already done! the results are in /users/biadmin/outputMR

the source is located in https://github.com/alonsoir/mrDominioRegistrador

the data is taken from http://datos.gob.es

http://en.wikipedia.org/wiki/MapReduce

Enjoy!

 

I like science in general, and space technologies, so it was clear that my first step is to use the latest know how, http://aironman2k.wordpress.com/2014/05/05/about-web-sockets-and-how-to-use-it-in-order-to-get-real-time-data/ in order to know where the International Space Station is.

The idea is simple, i need to feed a rabbitmq server with stomp support with the json provided by an application server which is running a web service and then the client needs to subscribe to a specific topic in order to print the data. The code is quite simple, so feel free to download it and share.

If you are asking that this project is too similar with the latest one, well, yes, it is similar, the difference is this web service is behind a secure socket layer, so we need to do import the cert file into our j2ee application server.

Please read mkyong for the details, it is very important because you can avoid Man in the middle attacks or at least minimize the possible problem.

http://www.mkyong.com/webservices/jax-ws/suncertpathbuilderexception-unable-to-find-valid-certification-path-to-requested-target/

The code is located in https://github.com/alonsoir/whereisISS

Seguir

Recibe cada nueva publicación en tu buzón de correo electrónico.

Únete a otros 49 seguidores