0%

安装编译环境和必要的附加组件

1
$ sudo apt install dkms build-essential pkg-config libglib2.0-dev libpixman-1-dev libusb-dev libusbredirparser-dev libfdt-dev libbz2-dev flex bison

下载解压源代码

1
2
$ wget https://download.qemu.org/qemu-4.0.0.tar.xz
$ tar xvJf qemu-4.0.0.tar.xz

配置并编译

只编译x86_64架构

1
2
3
4
5
$ cd qemu-4.0.0
$ mkdir build
$ cd build
$ ../configure --target-list=x86_64-softmmu --enable-debug
$ make -j8

安装

1
$ sudo make install

目标程序安装在/usr/local/

安全很重要,但也很烦人。

lxd容器test8内无法启动tomcat9,查看服务状态如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# systemctl status tomcat9
● tomcat9.service - Apache Tomcat 9 Web Application Server
Loaded: loaded (/lib/systemd/system/tomcat9.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2019-06-06 20:52:24 CST; 10min ago
Docs: https://tomcat.apache.org/tomcat-9.0-doc/index.html
Process: 187 ExecStartPre=/usr/libexec/tomcat9/tomcat-update-policy.sh (code=exited, status=0/SUCCESS)
Process: 191 ExecStart=/bin/sh /usr/libexec/tomcat9/tomcat-start.sh (code=exited, status=226/NAMESPACE)
Main PID: 191 (code=exited, status=226/NAMESPACE)

Jun 06 20:52:24 test8 systemd\[1\]: Starting Apache Tomcat 9 Web Application Server...
Jun 06 20:52:24 test8 systemd\[1\]: Started Apache Tomcat 9 Web Application Server.
Jun 06 20:52:24 test8 systemd\[191\]: tomcat9.service: Failed to set up mount namespacing: Permission den
ied
Jun 06 20:52:24 test8 systemd\[191\]: tomcat9.service: Failed at step NAMESPACE spawning /bin/sh: Permiss
ion denied
Jun 06 20:52:24 test8 systemd\[1\]: tomcat9.service: Main process exited, code=exited, status=226/NAMESPA
CE
Jun 06 20:52:24 test8 systemd\[1\]: tomcat9.service: Failed with result 'exit-code'.

这里就是因为apparmor阻止了一些资源的访问,细粒度的配置还需要仔细阅读文档,当前可以暂时关闭apparmor对容器的所有限制

1
2
$ lxc config set test8 raw.lxc "lxc.apparmor.profile=unconfined"
$ lxc restart test8

然后就可以正常启动tomcat9服务了。

假设有两个lxd host,分别为lxd-l和lxd-r,在lxd-l机器上添加一个remote叫做lxd-r

1
$ lxc remote add lxd-r <ip of lxd-r>

拷贝容器

  1. 停止容器并拷贝到远程
1
2
3
$ lxc stop pridns
$ lxc copy pridns lxd-r:pridns-backup
$ lxc start pridns
  1. 制作容器快照,第一个快照为snap0,以此类推,然后拷贝快照到lxd-r

    1
    2
    $ lxc snapshot my-container
    $ lxc copy my-container/snap0 lxd-r:my-container-backup
  2. 直接拷贝运行中的容器

    1
    2
    $ lxc copy pridns lxd-r:pridns-backup
    Error: Unable to perform container live migration. CRIU isn't installed on the source server

    直接拷贝运行中的容器叫做live migration,需要将其运行状态一起拷贝到目标容器,保持二者完全一致,这需要CRIU的支持,lxd已经打包了CUIR,需要在本地和远程host上分别启用CRIU

    1
    2
    $ sudo snap set lxd criu.enable=true
    $ sudo systemctl reload snap.lxd.daemon

    然后再拷贝

lxc move

lxd使用镜像来生成容器,可以有几种不同的方式来使用镜像

内建映像服务

lxd内建三个映像服务,分别是
ubuntu - 提供稳定版ubuntu镜像
ubuntu-daily - 提供每日构建版ubuntu镜像
images - 提供其他发行版镜像

这样使用内置镜像服务

1
2
3
$ lxc launch ubuntu:18.04 my-ubuntu
$ lxc launch ubuntu-daily:19.04 my-ubuntu-dev
$ lxc launch images:debian/buster/amd64 my-debian

显示镜像列表

1
2
3
$ lxc image list ubuntu:
$ lxc image list ubuntu-daily:
$ lxc image list images:

使用远程lxd实例的镜像

添加远程lxd实例

1
$ lxc remote add my-images 192.168.0.x

使用远程lxd实例的镜像实例化容器

1
$ lxc launch my-images:image-name your-container

远程lxd实例上的镜像列表

1
$ lxc image list my-images:

其实lxd内建的镜像服务就是lxd实例,无非其remote的名字为ubuntu, ubuntu-daily和images而已。

手动导入镜像文件

1
$ lxc image import <file> --alias my-alias

使用此导入的镜像

1
$ lxc launch my-alias my-container

使用容器或快照创建镜像

使用容器创建镜像

1
2
$ lxc my-container stop
$ lxc publish my-container --alias test-image

使用快照创建镜像

1
$ lxc publish my-container/my-snap --alias test-image2

之后就可以正常的使用这些镜像来生成容器了。

以下使用的容器名字为test8,运行debian/buster/amd64

创建容器快照

1
$ lxc snapshot test8 test8snap0

默认创建的是无状态快照,如果要将容器当前的运行状态一起保存到快照,需要使用--stateful参数,比如

1
$ lxc snapshot test8 test8snap1 --stateful

查看容器快照

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
$ lxc info test8
Name: test8
Remote: unix://
Architecture: x86_64
Created: 2019/06/04 07:52 UTC
Status: Running
Type: persistent
Profiles: default
Pid: 7769
Ips:
eth0: inet 192.168.0.8 vethM2PWUN
eth0: inet6 fe80::216:3eff:fea7:706b vethM2PWUN
lo: inet 127.0.0.1
lo: inet6 ::1
Resources:
Processes: 5
CPU usage:
CPU usage (in seconds): 123
Memory usage:
Memory (current): 2.05GB
Memory (peak): 2.45GB
Network usage:
eth0:
Bytes received: 347.96MB
Bytes sent: 5.86MB
Packets received: 132127
Packets sent: 63110
lo:
Bytes received: 4.07kB
Bytes sent: 4.07kB
Packets received: 52
Packets sent: 52
Snapshots:
test8snap0 (taken at 2019/06/04 08:17 UTC) (stateless)

快照恢复

1
$ lxc restore test8 test8snap0

删除快照

1
$ lxc delete test8/test8snap0

lxd容器默认的profile使用独立的私有桥接NAT网络,从外部不能直接访问容器。可以配置容器使用host网桥,从而可以使用与host在同一网段的ip地址,就可以方便的像访问host一样来访问容器了。

host设置网桥

安装bridge-utils

1
$ sudo apt install bridge-utiles

编辑/etc/network/interfaces文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto enp0s3
iface enp0s3 inet manual

# bridge interface
auto br0
iface br0 inet static
address 192.168.3.6/24
gateway 192.168.3.1
# bridge options
bridge_ports enp0s3
bridge_stp off
bridge_fd 0
bridge_maxwait 0
bridge_waitport 0

这里配置了host本地网桥br0,需要重新启动网络才能生效。

配置容器网络接口(方法一)

为容器添加网络设备eth0,桥接到host本地网络

1
$ lxc config device add bst eth0 nic nictype=bridged parent=br0 name=eth0

注意,容器原有的重名网络设备eth0会被直接覆盖,这时候容器新添加的网络设备默认使用dhcp获取ip地址,如果需要指定静态ip,请编辑容器的/etc/network/interfaces文件。

1
$ lxc exec bst -- vim /etc/network/interfaces

配置容器网络接口(方法二)

通过创建新的profile,并使已有容器或新建容器使用此profile

查看现有profile

1
2
3
4
5
6
$ lxc profile list
+---------+---------+
NAME USED BY
+---------+---------+
default 1
+---------+---------+

可以看到有一个容器在使用默认profile,来看一下default profile的配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ lxc profile show default 
config: {}
description: Default LXD profile
devices:
eth0:
name: eth0
nictype: bridged
parent: lxdbr0
type: nic
root:
path: /
pool: default
type: disk
name: default
used_by:
- /1.0/containers/bst

其实这就是执行lxd init命令时创建的默认profile

下面创建新的host桥接网络profile

1
2
$ lxc profile create hostbridgedprofile
Profile hostbridgedprofile created

编辑新添加的profile

1
2
3
4
5
6
7
8
9
$ cat <<EOF lxc profile edit hostbridgedprofile
description: Host Bridged networking LXD profile
devices:
eth0:
name: eth0
nictype: bridged
parent: br0
type: nic
EOF

可以确认一下新的profile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ lxc profile list
+--------------------+---------+
NAME USED BY
+--------------------+---------+
default 1
+--------------------+---------+
hostbridgedprofile 0
+--------------------+---------+

$ lxc profile show hostbridgedprofile
config: {}
description: Host Bridged networking LXD profile
devices:
eth0:
name: eth0
nictype: bridged
parent: br0
type: nic
name: hostbridgedprofile
used_by: \[\]

在现有容器bst上叠加新添加的profile

1
2
$ lxc profile assign bst default,hostbridgedprofile 
Profiles default,hostbridgedprofile applied to bst

注意,这里先使用default,然后用hostbridgedprofile来覆盖默认的网络设置

新建容器可以指定要使用的profile

1
$ lxc launch -p default -p hostbridgedprofile images:debian/buster/amd64 new_container

如果一个profile不再使用,可以删除掉,当然不能有容器在使用它才行:

1
2
$ lxc profile delete hostbridgedprofile 
Error: Profile is currently in use

配置容器网络接口(方法三)

重新初始化lxd,注意在有容器实例存在的情况下,重新初始化网络设置是可以的,但是已经存在的storage pool是不能改动的,当然可以添加新的storage pool

1
2
3
4
5
6
7
8
9
10
$ lxd init
Would you like to use LXD clustering? (yes/no) \[default=no\]:
Do you want to configure a new storage pool? (yes/no) \[default=yes\]: no
Would you like to connect to a MAAS server? (yes/no) \[default=no\]:
Would you like to create a new local network bridge? (yes/no) \[default=yes\]: no <=这里为no
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) \[default=no\]: yes <= 这里为yes
Name of the existing bridge or host interface: br0 <= 这里输入主机桥接接口bro
Would you like LXD to be available over the network? (yes/no) \[default=no\]:
Would you like stale cached images to be updated automatically? (yes/no) \[default=yes\]
Would you like a YAML "lxd init" preseed to be printed? (yes/no) \[default=no\]:

这其实是修改的default profile

References:
[1]How to initialize LXD again

docker为应用级容器技术,容器内只能运行一个主进程,而lxd是lxc的上层包装,是系统级容器技术,可以像虚拟化技术一样在容器内运行一个guest OS,但是更轻量。

惯例,主机debian,这次版本是buster。

安装snap

lxd是ubuntu亲生的,所以除了ubuntu可以直接用apt安装,其他发行版需要用snap安装,忍!

1
$ sudo apt install snapd

安装lxd

1
$ sudo snap install lxd --channel=3.0/stable

这里选择stable版本

用户权限及sudo

如果想使用当前普通用户来管理lxd容器,则需要将用户添加到lxd用户组中

1
$ sudo adduser $USER lxd

当前用户需要重新登录用户组才能生效

因为snap安装的lxd并不在任何传统的文件系统中,它奇葩的位于/snap/bin路径下,so需要编辑/etc/sudoer文件,添加/snap/bin到secure_path

1
2
$ which lxd
/snap/bin/lxd

下面就可以进入正题了

初始化lxd

1
$ lxd init

基本上一路enter即可,以后再详细了解每一项的含义吧, go

创建容器
从官方镜像源创建debian buster容器实例bst

1
$ lxc launch images:debian/buster/amd64 bst

如果创建ubuntu容器实例ubt,则可以这样

1
$ lxc launch ubuntu:18.04 ubt

ubuntu的源标签是ubuntu,其他所有发行版的源标签是images,再一次,忍!

容器列表

1
2
3
4
5
6
$ lxc list
+------+---------+---------------------+-----------------------------------------------+------------+-----------+
NAME STATE IPV4 IPV6 TYPE SNAPSHOTS
+------+---------+---------------------+-----------------------------------------------+------------+-----------+
bst RUNNING 10.132.77.54 (eth0) fd42:2d28:4331:ad36:216:3eff:fed5:4b5c (eth0) PERSISTENT 0
+------+---------+---------------------+-----------------------------------------------+------------+-----------+

查看容器实例信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ lxc info bst
Name: bst
Remote: unix://
Architecture: x86_64
Created: 2019/06/02 03:19 UTC
Status: Running
Type: persistent
Profiles: default
Pid: 1166
Ips:
eth0:inet10.132.77.54vethKTBXNA
eth0:inet6fd42:2d28:4331:ad36:216:3eff:fed5:4b5cvethKTBXNA
eth0:inet6fe80::216:3eff:fed5:4b5cvethKTBXNA
lo:inet127.0.0.1
lo:inet6::1
Resources:
Processes: 6
CPU usage:
CPU usage (in seconds): 7
Memory usage:
Memory (current): 211.66MB
Memory (peak): 297.09MB
Network usage:
eth0:
Bytes received: 1.43MB
Bytes sent: 69.46kB
Packets received: 966
Packets sent: 902
lo:
Bytes received: 0B
Bytes sent: 0B
Packets received: 0
Packets sent: 0

容器交互
获取容器的shell

1
$ lxc exec first -- /bin/bash

或者执行一次性命令

1
$ lxc exec first -- apt install procps

源里的镜像很干净,基本的工具都需要自己安装,比如procps包里提供了free, kill, pkill, pgrep, pmap, ps, pwdx, skill, slabtop, snice, sysctl, tload, top, uptime, vmstat, w, 和 watch等基本命令行工具

从容器内部往外拉取文件

1
$ lxc file pull first/etc/hosts .

从外部向容器推送文件

1
$ lxc file push hosts first/etc/

向容器推送文件夹

1
$ lxc file push -r folder first/path/to

停止容器

1
$ lxc stop bst

彻底删除容器

1
$ lxc delete bst

管理远程lxd服务器
lxc命令行工具既可以管理本地lxd服务器,也可以管理远程lxd服务器,这里的服务器是指运行的用于管理容器的lxd服务

要管理远程lxd服务器,首先在远程lxd服务器上执行:

1
2
$ lxc config set core.https_address "\[::\]"
$ lxc config set core.trust_password some-password

第一条命令使lxd服务在所有本地地址上监听8443端口
第二条命令设定访问的密码凭证

然后就可以在本地添加远程的lxd服务器

1
$ lxc remote add host-a <ip address or DNS name>

会提示服务器指纹,并要求提供上一步设置的密码,完成之后就可以像管理本地lxd服务一样来管理远程lxd服务,除了要明确的指定远程lxd服务区的别名之外:

1
$ lxc exec host-a:bst -- /bin/bash

这就是在本地管理远程lxd服务器的容器实例了。

后面继续探索…

Updated(03/16/2022):

如果使用zfs后端存储,OpenZFS(zfs-dmks)更新后需要将lxd更新到4.0.x稳定版,不然会有各种问题。

更新到当前的LTS Stable Release 4.0.x

1
$ sudo snap refresh lxd --channel=4.0/stable

首先需要安装guest addtions

主机端:

为客户机配置共享文件夹,略。

客户机:

列出可用的share folders

1
2
3
4
5
6
7
8
$ sudo VBoxControl sharedfolder list
Oracle VM VirtualBox Guest Additions Command Line Management Interface Version 6.0.8
(C) 2008-2019 Oracle Corporation
All rights reserved.

Shared Folder mappings (1):

01 - Downloads \[idRoot=0 writable auto-mount host-icase guest-icase mnt-pt=/media/host/\]

如果设定了自动挂装, virtualbox会自动挂载之,否则可以手动挂载:

1
$ sudo mount -t vboxsf Downloads /media/host

访问权限问题,将当前用户加入vboxsf组

1
$ sudo adduser $USER vboxsf

然后注销重新登录,或者使用

1
$ newgrp vboxsf

没有GUI的debian buster安装virtualbox guest additions

主机端:

启动客户机,点击菜单Devices->Insert Guest Additons Image…

客户机端:

安装内核模块build依赖:

1
$ apt-get install -y dkms build-essential linux-headers-$(uname -r)

挂载cdrom,安装客户附加组件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ sudo mount /dev/cdrom /media/cdrom
$ cd /media/cdrom
$ sudo su
# ./VBoxLinuxAdditions.run
Verifying archive integrity... All good.
Uncompressing VirtualBox 6.0.8 Guest Additions for Linux........
VirtualBox Guest Additions installer
Removing installed version 6.0.8 of VirtualBox Guest Additions...
Copying additional installer modules ...
Installing additional modules ...
VirtualBox Guest Additions: Starting.
VirtualBox Guest Additions: Building the VirtualBox Guest Additions kernel
modules. This may take a while.
VirtualBox Guest Additions: To build modules for other installed kernels, run
VirtualBox Guest Additions: /sbin/rcvboxadd quicksetup <version>
VirtualBox Guest Additions: or
VirtualBox Guest Additions: /sbin/rcvboxadd quicksetup all
VirtualBox Guest Additions: Building the modules for kernel 4.19.0-5-amd64.
update-initramfs: Generating /boot/initrd.img-4.19.0-5-amd64
VirtualBox Guest Additions: Running kernel modules will not be replaced until
the system is restarted

reboot客户机

1
# reboot

校验安装:

1
2
$ lsmod grep vboxguest
vboxguest 348160 2 vboxsf

安装成功

References:
[1]How to install VirtualBox Guest Additions on a GUI-less Ubuntu server host

一台很老旧的服务器terminal不断吐出一些错误提示:

1
2
3
4
5
6
7
\[ 1250.944486\] mce: \[Hardware Error\]: Machine check events logged
\[ 1250.944493\] \[Hardware Error\]: Corrected error, no action required.
\[ 1250.948666\] \[Hardware Error\]: CPU:24 (10:9:1) MC4_STATUS\[OverCEMiscV-AddrVCECC\]: 0xdc0a400079080a13
\[ 1250.952631\] \[Hardware Error\]: Error Addr: 0x00000004abffce80
\[ 1250.952633\] \[Hardware Error\]: MC4 Error (node 6): DRAM ECC error detected on the NB.
\[ 1250.952654\] EDAC MC6: 1 CE on mc#6csrow#3channel#0 (csrow:3 channel:0 page:0x4abffc offset:0xe80 grain:0 syndrome:0x7914)
\[ 1250.952656\] \[Hardware Error\]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)

可以看到是内存出现了错误,不过错误被纠正了,但内存显然是出现故障了。

先看看系统cpu节点信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 16
Model: 9
Model name: AMD Opteron(tm) Processor 6128
Stepping: 1
CPU MHz: 800.000
CPU max MHz: 2000.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.04
Virtualization: AMD-V
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 5118K
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7
NUMA node2 CPU(s): 8-11
NUMA node3 CPU(s): 12-15
NUMA node4 CPU(s): 16-19
NUMA node5 CPU(s): 20-23
NUMA node6 CPU(s): 24-27
NUMA node7 CPU(s): 28-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate vmmcall npt lbrv svm_lock nrip_save pausefilter

共有四个socket,四颗cpu,每颗CPU有八个核心,总共是32核心,对于NUMA结构的机器,一般来讲每颗CPU应该至少有一个本地的内存控制器

安装edac-util,查看内存控制器信息

1
2
3
4
5
6
7
$ sudo apt install edac-utils
$ edac-util -vs
edac-util: EDAC drivers are loaded. 4 MCs detected:
mc0:F10h
mc2:F10h
mc4:F10h
mc6:F10h

可以看到有四个内存控制器,再查看每个内存控制器可能存在的错误

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

$ edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: mc#0csrow#2channel#0: 0 Corrected Errors
mc0: csrow3: 0 Uncorrected Errors
mc0: csrow3: mc#0csrow#3channel#0: 0 Corrected Errors
mc2: 0 Uncorrected Errors with no DIMM info
mc2: 0 Corrected Errors with no DIMM info
mc2: csrow2: 0 Uncorrected Errors
mc2: csrow2: mc#2csrow#2channel#0: 0 Corrected Errors
mc2: csrow3: 0 Uncorrected Errors
mc2: csrow3: mc#2csrow#3channel#0: 0 Corrected Errors
mc4: 0 Uncorrected Errors with no DIMM info
mc4: 0 Corrected Errors with no DIMM info
mc4: csrow2: 0 Uncorrected Errors
mc4: csrow2: mc#4csrow#2channel#0: 0 Corrected Errors
mc4: csrow3: 0 Uncorrected Errors
mc4: csrow3: mc#4csrow#3channel#0: 0 Corrected Errors
mc6: 0 Uncorrected Errors with no DIMM info
mc6: 0 Corrected Errors with no DIMM info
mc6: csrow2: 0 Uncorrected Errors
mc6: csrow2: mc#6csrow#2channel#0: 2 Corrected Errors
mc6: csrow3: 0 Uncorrected Errors
mc6: csrow3: mc#6csrow#3channel#0: 4 Corrected Errors

或者这样查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ grep \[0-9\] /sys/devices/system/edac/mc/mc*/csrow*/*ce_count
/sys/devices/system/edac/mc/mc0/csrow2/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:0
/sys/devices/system/edac/mc/mc2/csrow2/ce_count:0
/sys/devices/system/edac/mc/mc2/csrow2/ch0_ce_count:0
/sys/devices/system/edac/mc/mc2/csrow3/ce_count:0
/sys/devices/system/edac/mc/mc2/csrow3/ch0_ce_count:0
/sys/devices/system/edac/mc/mc4/csrow2/ce_count:0
/sys/devices/system/edac/mc/mc4/csrow2/ch0_ce_count:0
/sys/devices/system/edac/mc/mc4/csrow3/ce_count:0
/sys/devices/system/edac/mc/mc4/csrow3/ch0_ce_count:0
/sys/devices/system/edac/mc/mc6/csrow2/ce_count:3
/sys/devices/system/edac/mc/mc6/csrow2/ch0_ce_count:3
/sys/devices/system/edac/mc/mc6/csrow3/ce_count:6
/sys/devices/system/edac/mc/mc6/csrow3/ch0_ce_count:6

可以看到出现错误的内存位于MC6,csrow2和csrow3,也就是问题出现在第四个(CPU的)内存控制器的0通道DIMM0内存这里。

References:
[1]How to identify defective DIMM from EDAC error on Linux
[2]内存条物理结构分析