The idea is simple, a producer process trying to push json tweets from twitter within a kafka topic and another spark-streaming process trying to pull data from that topic and storing them into a mongo instance.
I have adopted the second approach for the streaming process, named Direct approach, because with this approach if the producer dies, the streaming process is waiting for them to continue processing tweets, it is more convenient. With the first approach, named Receiver-based, the streaming process would die if the producer dies or the producer does not have more data to push into the topic.
You have to edit the file named src/main/resources/reference.conf with your own data before you can run the different processes.
The code of the project is located here, so download it and have fun!.