When you are developing on Apache Spark, a very common use case is going to be accessing data from HDFS. As you would know, HDFS can be accessed via it's URI that looks like hdfs://<hostname>:<port>/user/…..
When you start your Spark shell, the SparkContext will be available as “sc”.
My file is at the following location in HDFS:
hdfs://localhost:8020/user/spark/abc.txt
A file handle can be obtained via:
val file = sc.textFile("hdfs://localhost:8020/user/spark/abc.txt");
How did I find my HDFS host and port?
Go to your <HADOOP_HOME>/etc/hadoop directory. If your configuration files are stored at a different location, navigate to the directory specified as HADOOP_CONF_DIR in your environment variables. Open file core-site.xml and look for the configuration property fs.defaultFS
Recomendations:
If there is no port specified above, try with 8020 or 9000 as they are the default ports. You can also try accessing the file system without specifying the port but I have sometimes seen errors thrown with that approach. In any case, if your Hadoop implementation has a different port explicitly configured you will need to use it.
Another good practice (in my view) would be to have a global variable somewhere that points to your HDFS root directory. You can specify all paths relative to it in latter parts of your code.
e.g. val hdfsURI="hdfs://localhost:8020/"
Harrah's Cherokee Casino Site - Lucky Club
ReplyDeleteHarrah's Cherokee Casino is a tribal casino and hotel in Cherokee, North Carolina. It is luckyclub.live owned by the Eastern Band of Cherokee Indians. Harrah's
Harrah's Atlantic City Casino & Hotel - JM Hub
ReplyDeleteView 전라남도 출장마사지 customer ratings, hours, contact details & reviews of 사천 출장샵 Harrah's Atlantic City Casino & Hotel, including latest 김천 출장마사지 reviews, videos, photos 울산광역 출장샵 and 용인 출장마사지 more.