spark-shell
Follow instructions for your environment.
$ spark-shell
Type this in Spark Shell
sc.setLogLevel("WARN")
Go to http://localhost:4040 in the browser.
Let's load /etc/hosts
file in Spark Shell.
Issue the following commands in Spark-shell
val f = sc.textFile("file:///etc/hosts")
# count how many lines are there
f.count
# print the first line
f.first
# print all lines
f.collect
** => Inspect the Spark Shell UI (port 4040) **
Let's load a sample clickstream data from `/user/root/clickstream/in/clickstream.csv'. Try the following in Spark-shell
val h = sc.textFile("/user/root/clickstream/in/clickstream.csv")
# count the lines
h.count
# print the lines
h.collect
Now let's load all data in clickstream/in
directory.
val h = sc.textFile("/user/root/clickstream/in/")
# count the lines
h.count
** => Inspect the Spark Shell UI (port 4040) **