Some Hadoop commands that are very commonly used
Here we are going to learn about Hadoop. But before we get started running our first MapReduce program because it is most common to see MapReduce examples run on Linux, although here were going to run both on Linux and on Windows, for those who are not quite familiar with core Linex, we did include here a list of common Hadoop shell commands, which are very similar to Linux commands. So, you can see this is just up kind of a path through which you go when you are executing a MapReduce job that you’ve already written the logic for it. So, here we first start with the Hadoop file system and then you have these various commands:
- Hadoop fs -cat file:///file2
- hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
- hadoop fs -copyfromLocal <fromDir> <toDir>
- hadoop fs –put <localfile> hdfs://nn.example.com/hadoop/hadoopfile
- sudo hadoop jar <jarfilename> <method> <fromDir> <toDir>
- hadoop fs -1s /user/hadoop/dir1
- Hadoop fs -cat hdfs://nn.example.com/file1
- hadoop fs -get /user/hadoop/file <localfile>
- The command cat allows you to read the contents of a file on the screen and the convention, if you’re addressing local file system, is the file, semi-colon and three forward slashes (file:///).
- Next you have the file system maker directory mkdir; you’re making directories here in the HDFS file system and you’ll probably remember that you are commonly working on the underlying file system whether it’s Linux or whatever and the Hadoop file system when you’re working with MapReduce.
- The next command is copyfromLocal: useful for just passing in the arguments from and to.
- Then you can use the put command and that will put the local files from the local file system up onto the HDFS file system. There are various types of naming conventions; we just give me an example here with the hdfs://nn.example.com/hadoop/hadoopfile etc.
- We have also included sudo which can be new to those who have just begun. It is roughly equivalent to the run as the administrator run is another user, usually the super user. In some cases, it might not be required but we just put it to show the syntax usage.
- The next command is the jar, which is that compiled Java:MapReduce and you pass the jar file and then you call the method that is to be run when the jar is executed for the MapReduce and then you say where the input is coming from and where the new output should go to.
- Generally what you’ll do when the MapReduce job finishes is you run some kind of command to verify the output. So, the idea is you can run ls which lists the contents of the directory and then once you determine the output file name and it’s not called file one there’s output directory created with multiple files but we put file1 to make it simpler here.
- The cat command is often run to read the contents; sometimes just want to read a portion and you can do that as well with some various commands
- Then you can also run a get command.