In the learning process of big data , The mastery of clusters , We should be proficient in both theory and practice . However, at the beginning of learning, many small partners , We often encounter problems like this and that . Today we’ll start with big data , To share Hadoop Pseudo distributed cluster environment building tutorial .
Let’s take three virtual machines as an example to do the exercise , In the case of limited computer performance , It’s possible to build three virtual machines .
One of the three virtual machines is used as master, The primary node , be used for Hadoop Of NameNode node （NameNode Its main function is to record meta information of data , As shown in the table 、 The name of the table 、 Data blocks id etc. ）;
Two of the three virtual machines are used for slave, That is, the slave node , be used for Hadoop Of DataNode node （DataNode Its main function is to store data content and read and write data , Usually a data block is 128M）;
Three virtual machines can communicate with each other , At the same time, virtual machine and PC Computers can also communicate ;
Next , And we start to do it step by step Hadoop Cluster building ——
First step ： configure network
In order to be able to make PC Communication between virtual machine and virtual machine , Must be PC Virtual machine and virtual machine IP Set to the same network segment . If you want virtual machines to be networked as well , Also need to PC The gateway of virtual machine and virtual machine remains the same . therefore , Need configuration PC Network of virtual machines and virtual machines .
The second step ： Fix Linux Systematic IP Address
After setting up the network card , Finally, you need to configure Master node （ One of the virtual machines Linux The computer of the system ） Of IP、 Gateway and subnet mask , The specific operation is as follows ：
Delete etho In the network card UUID value （ Unique identification number ）、HWADDR（MAC Address ） Values and LAST_CONNECT（ Last connection time ） value , If not deleted , It will affect the cloning of virtual machines in the future .
hold eth0 In the network card BOOTPROTO Set to static（ It means static IP）, In addition, add static IP Address （ The network segment must be connected to PC The machine is consistent ）、 gateway 、 Subnet mask sum DNS The server .
The third step ： test PC Communication between virtual machine and virtual machine
virtual machine （192.168.8.100） Sure ping through PC machine （192.168.8.88）, At the same time, virtual machines can also ping through baidu（ It means you can connect to the network ）.
Step four ： Close the firewall and selinux
To prevent unnecessary trouble , Make follow-up hadoop Cluster construction is more smooth , It’s better to turn off the firewall of the virtual machine , The order is as follows ：
chkconfig iptables off
To be sure , You also need to configure /etc/sysconfig In the catalog selinux file .
Restart the virtual machine , And verify that the firewall is shut down successfully .
Step five ： Modify hostname
The virtual machine needs to be modified to master, The purpose is to distinguish the master node in a distributed cluster （master） And slave nodes （slaves）. Modifying the host name requires configuration /etc/hosts Document and /etc/sysconfig/network These two documents , After modifying these two files , Just restart the virtual machine , The name of the computer has been changed .
Step six ： Clone virtual machine
The above operation is just to configure a master The computer , We have mentioned before ,hadoop The cluster will be built on one master And two slaves above , So we need to create two virtual machines . All you need to do is clone .
The cloning process is very simple , stay VMware The home page of , Right click on a virtual machine , Select clones in Administration （ Choose the full clone ）, And then the next step , Set the name and installation address of the virtual machine , Finally, click finish and wait for it to finish cloning .
But the problem is , Cloned slave1 Machines can’t connect to the Internet , That’s because at the time of cloning , New network card eth1 covers eth0.
The solution to the problem is simple , Just configure /etc/udev/rules.d/ In the catalog 70-persistent-net.rules File can , Let’s take a look at the contents of the file first ：
When the configuration , Comment out the first line （ In the first line SUBSYSTEM prefix # Number ）, And then put the second line of eth1 Value changed to eth0 value .
Configure clone virtual machine eth0 network card
Input ：vim/etc/sysconfig/network-scripts/ifcfg-eth0, take IP Change the address to 192.168.8.101.
Step seven ： Modify the name of the clone virtual machine
To configure /etc/hosts Document and /etc/sysconfig/network file , Change the name of the virtual machine to slave1. Empathy , Install clones slave1 Clone another virtual machine in the same way , And change the name of the computer to slave2.
Last , stay Shell5 Connect three virtual machines at the same time , Test the communication between three virtual machines , Network communication can be realized between each other , They can also connect to the Internet , Even if it’s successful .
in general , In big data learning , Learning to build a cluster environment is a key step , The next step is to learn the big data technology framework .
Link to the original text ：