The idea with this project is to learn the basis about how to use spark with generated ROOT files from LHC experiment.

## Preliminaries
Download spark from official site: https://spark.apache.org/downloads.html
Download spark-root_2.11-0.1.17.jar file from https://github.com/diana-hep/spark-root/blob/master/jars/spark-root_2.11-0.1.17.jar
Download ROOT files from http://opendata.cern.ch/record/3860 6GB
Above files maybe are too big to work in a laptop, so, maybe you want to download these files from
https://github.com/cerndb/SparkDLTrigger/tree/master/Data

That is all.

## Working
     ~> spark-shell --packages org.diana-hep:root4j:0.1.6 --jars /Users/aironman/Downloads/spark-root_2.11-0.1.17.jar
     Spark context Web UI available at http://192.168.1.36:4040
     Spark context available as 'sc' (master = local[*], app id = local-1557233773635).
     Spark session available as 'spark'.
     Welcome to
           _              _          / /_  _ ___/ /_         \ \/ _ \/ _ `/ _/  '/        // ./_,// //_\   version 2.4.1           //
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172)
Type in expressions to have them evaluated.
Type :help for more information.
Importing root library
scala> import org.dianahep.sparkroot.experimental._
import org.dianahep.sparkroot.experimental._
Creating DataFrames from ROOT files…
scala> val dfGamma = spark.read.root("/Users/aironman/Downloads/complete_set_of_ATLAS_open_data_samples_July_2016/Data/DataEgamma.root")
    dfGamma: org.apache.spark.sql.DataFrame = [runNumber: int, eventNumber: int … 44 more fields]
scala> val dfMuons = spark.read.root("/Users/aironman/Downloads/complete_set_of_ATLAS_open_data_samples_July_2016/Data/DataMuons.root")
dfMuons: org.apache.spark.sql.DataFrame = [runNumber: int, eventNumber: int ... 44 more fields]

Creating parquet files…
    scala> val muonsParquetFile = dfMuons.write.parquet("dfMuons.parquet")
    muonsParquetFile: Unit = ()
a little error creating gamma parquet file… dfGammam??
    scala> val gammaParquetFile = dfGamma.write.parquet("dfGammam.parquet")
    gammaParquetFile: Unit = ()                                                     Reading created parquet files
    scala> val muonsParquetFile = spark.read.parquet("dfMuons.parquet")
    muonsParquetFile: org.apache.spark.sql.DataFrame = [runNumber: int, eventNumber: int … 44 more fields]
scala> val gammaParquetFile = spark.read.parquet("dfGammam.parquet")
gammaParquetFile: org.apache.spark.sql.DataFrame = [runNumber: int, eventNumber: int ... 44 more fields]

Caching Dataframes…
     scala> dfMuons.cache()
     res4: dfMuons.type = [runNumber: int, eventNumber: int … 44 more fields]
scala> dfGamma.cache()
res5: dfGamma.type = [runNumber: int, eventNumber: int ... 44 more fields]

Counting rows…
scala> dfMuons.count
res2: Long = 7028084
scala> dfGamma.count
res3: Long = 7917590                                                            
necessary imports
scala> import spark.implicits._
import spark.implicits._
printing schema to output
    scala> dfMuons.printSchema
    root
     |-- runNumber: integer (nullable = true)<br>
     |-- eventNumber: integer (nullable = true)<br>
     |-- channelNumber: integer (nullable = true)<br>
     |-- mcWeight: float (nullable = true)<br>
     |-- pvxp_n: integer (nullable = true)<br>
     |-- vxp_z: float (nullable = true)<br>
     |-- scaleFactor_PILEUP: float (nullable = true)<br>
     |-- scaleFactor_ELE: float (nullable = true)<br>
     |-- scaleFactor_MUON: float (nullable = true)<br>
     |-- scaleFactor_BTAG: float (nullable = true)<br>
     |-- scaleFactor_TRIGGER: float (nullable = true)<br>
     |-- scaleFactor_JVFSF: float (nullable = true)<br>
     |-- scaleFactor_ZVERTEX: float (nullable = true)<br>
     |-- trigE: boolean (nullable = true)<br>
     |-- trigM: boolean (nullable = true)<br>
     |-- passGRL: boolean (nullable = true)<br>
     |-- hasGoodVertex: boolean (nullable = true)<br>
     |-- lep_n: integer (nullable = true)<br>
     |-- lep_truthMatched: array (nullable = true)<br>
     |    |-- element: boolean (containsNull = true)<br>
     |-- lep_trigMatched: array (nullable = true)<br>
     |    |-- element: short (containsNull = true)<br>
     |-- lep_pt: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_eta: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_phi: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_E: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_z0: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_charge: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_type: array (nullable = true)<br>
     |    |-- element: integer (containsNull = true)<br>
     |-- lep_flag: array (nullable = true)<br>
     |    |-- element: integer (containsNull = true)<br>
     |-- lep_ptcone30: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_etcone20: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_trackd0pvunbiased: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- lep_tracksigd0pvunbiased: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- met_et: float (nullable = true)<br>
     |-- met_phi: float (nullable = true)<br>
     |-- jet_n: integer (nullable = true)<br>
     |-- alljet_n: integer (nullable = true)<br>
     |-- jet_pt: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- jet_eta: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- jet_phi: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- jet_E: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- jet_m: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- jet_jvf: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- jet_trueflav: array (nullable = true)<br>
     |    |-- element: integer (containsNull = true)<br>
     |-- jet_truthMatched: array (nullable = true)<br>
     |    |-- element: integer (containsNull = true)<br>
     |-- jet_SV0: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)<br>
     |-- jet_MV1: array (nullable = true)<br>
     |    |-- element: float (containsNull = true)</p>

<pre class="wp-block-syntaxhighlighter-code brush: plain; notranslate">scala> dfGamma.printSchema
root
 |-- runNumber: integer (nullable = true)
 |-- eventNumber: integer (nullable = true)
 |-- channelNumber: integer (nullable = true)
 |-- mcWeight: float (nullable = true)
 |-- pvxp_n: integer (nullable = true)
 |-- vxp_z: float (nullable = true)
 |-- scaleFactor_PILEUP: float (nullable = true)
 |-- scaleFactor_ELE: float (nullable = true)
 |-- scaleFactor_MUON: float (nullable = true)
 |-- scaleFactor_BTAG: float (nullable = true)
 |-- scaleFactor_TRIGGER: float (nullable = true)
 |-- scaleFactor_JVFSF: float (nullable = true)
 |-- scaleFactor_ZVERTEX: float (nullable = true)
 |-- trigE: boolean (nullable = true)
 |-- trigM: boolean (nullable = true)
 |-- passGRL: boolean (nullable = true)
 |-- hasGoodVertex: boolean (nullable = true)
 |-- lep_n: integer (nullable = true)
 |-- lep_truthMatched: array (nullable = true)
 |    |-- element: boolean (containsNull = true)
 |-- lep_trigMatched: array (nullable = true)
 |    |-- element: short (containsNull = true)
 |-- lep_pt: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_eta: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_phi: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_E: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_z0: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_charge: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_type: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- lep_flag: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- lep_ptcone30: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_etcone20: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_trackd0pvunbiased: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_tracksigd0pvunbiased: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- met_et: float (nullable = true)
 |-- met_phi: float (nullable = true)
 |-- jet_n: integer (nullable = true)
 |-- alljet_n: integer (nullable = true)
 |-- jet_pt: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_eta: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_phi: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_E: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_m: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_jvf: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_trueflav: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- jet_truthMatched: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- jet_SV0: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_MV1: array (nullable = true)
 |    |-- element: float (containsNull = true)</pre>
Cannot write Dataframes to csv files. Only to parquet files, why?</p>

scala> dfGamma.write.csv("gamma.csv")
org.apache.spark.sql.AnalysisException: CSV data source does not support array<boolean> data type.;
...  

scala> dfMuons.write.csv("muons.csv")
org.apache.spark.sql.AnalysisException: CSV data source does not support array<boolean> data type.;
  ...

printing schema again.

scala> dfMuons.schema
res9: org.apache.spark.sql.types.StructType = StructType(StructField(runNumber,IntegerType,true), StructField(eventNumber,IntegerType,true), StructField(channelNumber,IntegerType,true), StructField(mcWeight,FloatType,true), StructField(pvxp_n,IntegerType,true), StructField(vxp_z,FloatType,true), StructField(scaleFactor_PILEUP,FloatType,true), StructField(scaleFactor_ELE,FloatType,true), StructField(scaleFactor_MUON,FloatType,true), StructField(scaleFactor_BTAG,FloatType,true), StructField(scaleFactor_TRIGGER,FloatType,true), StructField(scaleFactor_JVFSF,FloatType,true), StructField(scaleFactor_ZVERTEX,FloatType,true), StructField(trigE,BooleanType,true), StructField(trigM,BooleanType,true), StructField(passGRL,BooleanType,true), StructField(hasGoodVertex,BooleanType,true), StructField(...

this is one field, but what is it mean?

scala> dfMuons.select("met_et").show()
+---------+
|   met_et|
+---------+
|  94215.1|
|30354.057|
|54632.633|
|18974.707|
| 18013.09|
|36319.457|
|16567.258|
| 46948.73|
|26812.076|
|56296.965|
|47373.312|
| 22700.36|
| 66713.07|
|27822.385|
|40274.438|
| 42998.26|
|  26655.3|
|26368.762|
| 49253.22|
|19660.633|
+---------+
only showing top 20 rows


scala> dfMuons.select("met_et").count
res11: Long = 7028084                                                           

scala> dfMuons.select($"met_et",$"jet_jvf").count
res12: Long = 7028084                                                           

scala> dfMuons.select($"met_et",$"jet_jvf").show(5,false)
+---------+-----------+
|met_et   |jet_jvf    |
+---------+-----------+
|94215.1  |[]         |
|30354.057|[0.9523457]|
|54632.633|[]         |
|18974.707|[]         |
|18013.09 |[]         |
+---------+-----------+
only showing top 5 rows

CREATING TEMP TABLE, I prefer to use SQL instead of map, reduce and other stuff functions…

scala> dfMuons.createOrReplaceTempView("MUONS")

scala> val sqlDFMuons = spark.sql("SELECT * FROM MUONS").show(5,false)
...
|runNumber|eventNumber|channelNumber|mcWeight|pvxp_n|vxp_z     |scaleFactor_PILEUP|scaleFactor_ELE|scaleFactor_MUON|scaleFactor_BTAG|scaleFactor_TRIGGER|scaleFactor_JVFSF|scaleFactor_ZVERTEX|trigE|trigM|passGRL|hasGoodVertex|lep_n|lep_truthMatched|lep_trigMatched|lep_pt     |lep_eta      |lep_phi     |lep_E      |lep_z0        |lep_charge|lep_type|lep_flag   |lep_ptcone30|lep_etcone20|lep_trackd0pvunbiased|lep_tracksigd0pvunbiased|met_et   |met_phi   |jet_n|alljet_n|jet_pt     |jet_eta    |jet_phi  |jet_E      |jet_m     |jet_jvf    |jet_trueflav|jet_truthMatched|jet_SV0|jet_MV1      |
...
|207490   |17284798   |207490       |0.0     |13    |5.419942  |0.0               |0.0            |0.0             |0.0             |0.0                |0.0              |0.0                |false|true |true   |true         |1    |[false]         |[1]            |[29865.918]|[1.9333806]  |[-2.166267] |[105389.39]|[-0.008121626]|[1.0]     |[13]    |[568344575]|[0.0]       |[228.17558] |[-0.0022841152]      |[0.017784366]           |18013.09 |0.86960316|0    |0       |[]         ...
only showing top 5 rows

sqlDFMuons: Unit = ()

scala> val sqlDFMuons = spark.sql("SELECT * FROM MUONS").show(1,false)
|runNumber|eventNumber|channelNumber|mcWeight|pvxp_n|vxp_z     |scaleFactor_PILEUP|scaleFactor_ELE|scaleFactor_MUON|scaleFactor_BTAG|scaleFactor_TRIGGER|scaleFactor_JVFSF|scaleFactor_ZVERTEX|trigE|trigM|passGRL|hasGoodVertex|lep_n|lep_truthMatched|lep_trigMatched|lep_pt     |lep_eta   |lep_phi    |lep_E      |lep_z0        |lep_charge|lep_type|lep_flag   |lep_ptcone30|lep_etcone20|lep_trackd0pvunbiased|lep_tracksigd0pvunbiased|met_et |met_phi   |jet_n|alljet_n|jet_pt|jet_eta|jet_phi|jet_E|jet_m|jet_jvf|jet_trueflav|jet_truthMatched|jet_SV0|jet_MV1|
|207490   |17281852   |207490       |0.0     |15    |-12.316585|0.0               |0.0            |0.0             |0.0             |0.0                |0.0              |0.0                |false|true |true   |true         |1    |[false]         |[3]            |[40531.855]|[0.288244]|[1.3469992]|[42227.465]|[-0.045446984]|[-1.0]    |[13]    |[568344575]|[0.0]       |[94.18325]  |[-0.04912882]        |[0.0152232405]          |94215.1|-1.3943559|0    |0       |[]    |[]     |[]     |[]   |[]   |[]     |[]          |[]              |[]     |[]     |
...
only showing top 1 row

sqlDFMuons: Unit = ()

scala> val sqlDFMuons = spark.sql("SELECT * FROM MUONS").show(1,true)
...
|runNumber|eventNumber|channelNumber|mcWeight|pvxp_n|     vxp_z|scaleFactor_PILEUP|scaleFactor_ELE|scaleFactor_MUON|scaleFactor_BTAG|scaleFactor_TRIGGER|scaleFactor_JVFSF|scaleFactor_ZVERTEX|trigE|trigM|passGRL|hasGoodVertex|lep_n|lep_truthMatched|lep_trigMatched|     lep_pt|   lep_eta|    lep_phi|      lep_E|        lep_z0|lep_charge|lep_type|   lep_flag|lep_ptcone30|lep_etcone20|lep_trackd0pvunbiased|lep_tracksigd0pvunbiased| met_et|   met_phi|jet_n|alljet_n|jet_pt|jet_eta|jet_phi|jet_E|jet_m|jet_jvf|jet_trueflav|jet_truthMatched|jet_SV0|jet_MV1|
...
|   207490|   17281852|       207490|     0.0|    15|-12.316585|               0.0|            0.0|             0.0|             0.0|                0.0|              0.0|                0.0|false| true|   true|         true|    1|         [false]|            [3]|[40531.855]|[0.288244]|[1.3469992]|[42227.465]|[-0.045446984]|    [-1.0]|    [13]|[568344575]|       [0.0]|  [94.18325]|        [-0.04912882]|          [0.0152232405]|94215.1|-1.3943559|    0|       0|    []|     []|     []|   []|   []|     []|          []|              []|     []|     []|
...
only showing top 1 row

sqlDFMuons: Unit = ()

Doing more imports…

scala> spark.sqlContext.setConf("parquet.filter.statistics.enabled","true")

scala> spark.sqlContext.setConf("parquet.filter.dictionary.enabled","true")

scala> spark.sqlContext.setConf("spark.sql.parquet.filterPushdown","true")

scala> spark.sqlContext.setConf("spark.sql.hive.convertMetastoreParquet","true")

scala> spark.sqlContext.setConf("spark.sql.hive.convertMetastoreParquet.mergeSchema","false")

scala> spark.sqlContext.setConf("spark.sql.parquet.mergeSchema","false")

scala> val sqlContext = spark.sqlContext
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@39e621bf

import org.dianahep.sparkroot.experimental._

Here i am following this hint: Feature Engineering and Event Selection: where the Parquet files containing all the events details processed in Data Ingestion are filtered, and datasets with new features are produced

scala> val ds = dfMuons.as[Seq[Seq[Double]]]
org.apache.spark.sql.AnalysisException: Try to map struct<runNumber:int,eventNumber:int,channelNumber:int,mcWeight:float,pvxp_n:int,vxp_z:float,scaleFactor_PILEUP:float,scaleFactor_ELE:float,scaleFactor_MUON:float,scaleFactor_BTAG:float,scaleFactor_TRIGGER:float,scaleFactor_JVFSF:float,scaleFactor_ZVERTEX:float,trigE:boolean,trigM:boolean,passGRL:boolean,hasGoodVertex:boolean,lep_n:int,lep_truthMatched:array<boolean>,lep_trigMatched:array<smallint>,lep_pt:array<float>,lep_eta:array<float>,lep_phi:array<float>,lep_E:array<float>,lep_z0:array<float>,lep_charge:array<float>,lep_type:array<int>,lep_flag:array<int>,lep_ptcone30:array<float>,lep_etcone20:array<float>,lep_trackd0pvunbiased:array<float>,lep_tracksigd0pvunbiased:array<float>,met_et:float,met_phi:float,jet_n:int,alljet_n:int,jet_pt:array<float>,jet_eta:array<float>,jet_phi:array<float>,jet_E:array<float>,jet_m:array<float>,jet_jvf:array<float>,jet_trueflav:array<int>,jet_truthMatched:array<int>,jet_SV0:array<float>,jet_MV1:array<float>> to Tuple1, but failed as the number of fields does not line up.;
...

But if i examine this schema, it doesnt looks like a Seq[Seq[Double]]… It is much more complicated!

<pre class="wp-block-syntaxhighlighter-code brush: plain; notranslate">## Again, printing schema… 
     scala> dfMuons.printSchema
     root
      |-- runNumber: integer (nullable = true)
      |-- eventNumber: integer (nullable = true)
      |-- channelNumber: integer (nullable = true)
      |-- mcWeight: float (nullable = true)
      |-- pvxp_n: integer (nullable = true)
      |-- vxp_z: float (nullable = true)
      |-- scaleFactor_PILEUP: float (nullable = true)
      |-- scaleFactor_ELE: float (nullable = true)
      |-- scaleFactor_MUON: float (nullable = true)
      |-- scaleFactor_BTAG: float (nullable = true)
      |-- scaleFactor_TRIGGER: float (nullable = true)
      |-- scaleFactor_JVFSF: float (nullable = true)
      |-- scaleFactor_ZVERTEX: float (nullable = true)
      |-- trigE: boolean (nullable = true)
      |-- trigM: boolean (nullable = true)
      |-- passGRL: boolean (nullable = true)
      |-- hasGoodVertex: boolean (nullable = true)
      |-- lep_n: integer (nullable = true)
      |-- lep_truthMatched: array (nullable = true)
      |    |-- element: boolean (containsNull = true)
      |-- lep_trigMatched: array (nullable = true)
      |    |-- element: short (containsNull = true)
      |-- lep_pt: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_eta: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_phi: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_E: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_z0: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_charge: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_type: array (nullable = true)
      |    |-- element: integer (containsNull = true)
      |-- lep_flag: array (nullable = true)
      |    |-- element: integer (containsNull = true)
      |-- lep_ptcone30: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_etcone20: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_trackd0pvunbiased: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- lep_tracksigd0pvunbiased: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- met_et: float (nullable = true)
      |-- met_phi: float (nullable = true)
      |-- jet_n: integer (nullable = true)
      |-- alljet_n: integer (nullable = true)
      |-- jet_pt: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- jet_eta: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- jet_phi: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- jet_E: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- jet_m: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- jet_jvf: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- jet_trueflav: array (nullable = true)
      |    |-- element: integer (containsNull = true)
      |-- jet_truthMatched: array (nullable = true)
      |    |-- element: integer (containsNull = true)
      |-- jet_SV0: array (nullable = true)
      |    |-- element: float (containsNull = true)
      |-- jet_MV1: array (nullable = true)
      |    |-- element: float (containsNull = true)
scala> 


val ds = dfMuons.as[Seq[Seq[Double]]]
...

## LINKS

https://github.com/diana-hep/spark-root/blob/master/ipynb/publicCMSMuonia_exampleAnalysis_wROOT.ipynb

https://es.wikipedia.org/wiki/Muon

https://www.lhc-closer.es/taking_a_closer_look_at_lhc/0.a___j

https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl

<p><a href="https://github.com/cerndb/SparkDLTrigger/blob/master/Docs/Poster.pdf" target="_blank" rel="noopener noreferrer nofollow">Click to access Poster.pdf</a></p>

<blockquote class="wp-embedded-content" data-secret="Lx52F3gDOf"><a href="https://databricks.com/sparkaisummit/north-america/2019-spark-summit-ai-keynotes">2019 Spark Summit + AI Keynotes April 24th</a></blockquote><iframe class="wp-embedded-content" sandbox="allow-scripts" security="restricted" style="position: absolute; clip: rect(1px, 1px, 1px, 1px);" title="&#8220;2019 Spark Summit + AI Keynotes April 24th&#8221; &#8212; Databricks" src="https://databricks.com/sparkaisummit/north-america/2019-spark-summit-ai-keynotes/embed#?secret=Lx52F3gDOf" data-secret="Lx52F3gDOf" width="600" height="338" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>

https://github.com/cerndb/SparkDLTrigger

http://canali.web.cern.ch/canali/

https://spark.apache.org/docs/latest/quick-start.html

https://pastebin.com/yCs6kjS2

This library could be useful to modify values from a Dataframe...
https://github.com/hablapps/sparkOptics</pre>

Thank you Doctor Canali, I will follow your tips.

WORK IN PROGRESS!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s