Recently, I made a fork of a project, thanks Marcos, in GitHub that allows to create a spark cluster using Docker, that is to say, it is specially indicated if you have a great machine to your disposition, of those of several cores, a lot of RAM memory and a lot of hard disk.
I have been working to have a small cluster in my laptop, a macbook pro with an i7 of fourth generation and 16GB of ram DDR4 1600 Mhz, so the reason fork was to modify it to make it work, because there are directories that are not in osx, also had to update the version of spark, add a spark task that could serve to do the test.
I modified it in order to work too in Linux because /tmp folder exists too both linux, osx and probably windows, but not completely not sure.
Spark version is 2.4.0, latest to this date. If it changes the moment you feel tempted to use this fork, make sure you modify the docker-compose.yml file in all the parts that appear in the image tag, as you can see, it is configured for version 2.4.0.
The fork is quite well explained in the README file, so I will only add that I included a new worker in the docker-compose file.yml, with its configuration and a script that makes the checks that the cluster is working, making use of calls to docker exec -ti to launch the checks of existence of the jar, of the input file in the master and in the cluster workers, as well as a call to docker-run to execute the jar, but I must recommend that it is better to enter the ip of the driver and launch from there using the traditional spark-submit command.
I have found that it works better for me.
Be sure that Docker daemon is up and running and run the project with docker-compose up command.
spark-master web page will run at 0.0.0.0:8080 and workers will do at 0.0.0.0:8081, using different internal ips within docker container.