Hi, these last days i was working developing a solution related with MongoDb, Twitter4j, spark streaming and machine learning (kmeans) using scala. The project needs sbt to build it, and it is the continuation of the previous project related with cassandra, spark streaming and machine learning with scala, so if you want that sbt test works, you are going to need that a cassandra server and a mongo server is running in your local machine.
The project is based on databricks reference app and spark mongodb stratio library, basically i just adapted the necessary to store json tweets in a mongo instance using the library of stratio. I started using the casbah library but i found it unclear to use it, stratio library is much easier to use, instead i found that stratio provides a cassandra connector, it looks promising, so in a near future, i will use it.
The next step is to integrate this project with a kafka broker…
Have fun and be nice with people.
I have passed a stomach flu, so forgive me if this post is not clear, i think the project is self explanatory and you will be able to change sources for your needs without a problem.