About an example using kafka, spark-streaming, mongodb and twitter

Hi everyone, in my process of mastering scala and big data technologies, i am learning how to integrate apache kafka, spark-streaming, mongodb and twitter.

The idea is simple, a producer process trying to push json tweets from twitter within a kafka topic and another spark-streaming process trying to pull data from that topic and storing them into a mongo instance.

I have adopted the second approach for the streaming process, named Direct approach, because with this approach if the producer dies, the streaming process is waiting for them to continue processing tweets, it is more convenient. With the first approach, named Receiver-based, the streaming process would die if the producer dies or the producer does not have more data to push into the topic.

You have to edit the file named src/main/resources/reference.conf with your own data before you can run the different processes.

The code of the project is located here, so download it and have fun!.

 

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s