[VAL-91] LTP test case failure: "SSH session not active" Created: 28/Nov/19 Updated: 05/Dec/19 Resolved: 05/Dec/19 |
|
| Status: | Done |
| Project: | Validation |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Juha Kosonen | Assignee: | Juha Kosonen |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
============================================================================== Ltp :: Validation, robustness and stability of Linux ============================================================================== RunLTP all tests :: Wait ~5hrs to complete 2536 tests | FAIL | 'INFO: creating /opt/ltp/results directory INFO: no command files were provided. Executing following runtest scenario files: syscalls fs fs_perms_simple fsx dio io mm ipc sched math nptl pty containers fs_bind controllers filecaps cap_bounds fcntl-locktests connectors power_management_tests hugetlb commands hyperthreading can cpuhotplug net.ipv6_lib input cve crypto kernel_misc uevent Checking for required user/group ids 'nobody' user id and group found. 'bin' user id and group found. 'daemon' user id and group found. Users group found. Sys group found. Required users/groups exist. If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. /etc/centos-release [ Message content over the limit has been removed. ] ...g_usage_in_bytes_test 2 TINFO: Process is still here after warm up: 1312173 memcg_usage_in_bytes_test 2 TFAIL: memory.memsw.usage_in_bytes is 4202496, 4194304 expected <<<execution_status>>> initiation_status="ok" duration=2 termination_type=exited termination_id=1 corefile=no cutime=4 cstime=15 <<<test_end>>> <<<test_start>>> tag=memcg_stress stime=1574848339 cmdline="memcg_stress_test.sh" contacts="" analysis=exit <<<test_output>>> memcg_stress_test 1 TINFO: timeout per run is 0h 35m 0s memcg_stress_test 1 TINFO: Calculated available memory 178387 MB memcg_stress_test 1 TINFO: Testing 150 cgroups, using 1188 MB, interval 5 memcg_stress_test 1 TINFO: Starting cgroups memcg_stress_test 1 TINFO: Testing cgroups for 900s' does not contain 'INFO: ltp-pan reported all tests PASS' Also teardown failed: Several failures occurred: 1) SSHException: SSH session not active 2) There was no directory matching '/opt/ltp/output'. 3) SSHException: SSH session not active 4) SSHException: SSH session not active 5) There was no directory matching '/opt/ltp/results'. 6) SSHException: SSH session not active ------------------------------------------------------------------------------ Ltp :: Validation, robustness and stability of Linux | FAIL | Suite teardown failed: Several failures occurred: 1) SSHException: SSH session not active 2) SSHException: SSH session not active 1 critical test, 0 passed, 1 failed 1 test total, 0 passed, 1 failed ==============================================================================
|
| Comments |
| Comment by Juha Kosonen [ 05/Dec/19 ] |
| Comment by Juha Kosonen [ 28/Nov/19 ] |
|
A patch submitted for a review: https://gerrit.akraino.org/r/c/validation/+/2075 |
| Comment by Juha Kosonen [ 28/Nov/19 ] |
|
At some phase during the test run, even ping works, SSH connectivity hangs for a long periods of time
[cloudadmin@controller-1 ~]$ ping -c2 controller-2 PING controller-2 (192.168.17.4) 56(84) bytes of data. 64 bytes from controller-2 (192.168.17.4): icmp_seq=1 ttl=64 time=0.068 ms 64 bytes from controller-2 (192.168.17.4): icmp_seq=2 ttl=64 time=0.155 ms --- controller-2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1002ms rtt min/avg/max/mdev = 0.068/0.111/0.155/0.044 ms [cloudadmin@controller-1 ~]$ time ssh controller-2 ^C real 1m14.575s user 0m0.003s sys 0m0.004s The node in question is detected as non-functional on k8s level too: [cloudadmin@controller-1 ~]$ kubectl get no NAME STATUS ROLES AGE VERSION 192.168.17.1 Ready master 7d23h v1.16.2 192.168.17.2 Ready worker 7d23h v1.16.2 192.168.17.3 Ready worker 7d23h v1.16.2 192.168.17.4 NotReady master 7d23h v1.16.2 192.168.17.5 Ready master 7d23h v1.16.2 Finally the node restarts: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotReady 66m kubelet, 192.168.17.4 Node 192.168.17.4 status is now: NodeNotReady Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: nginx, pid: 1227263 Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: nginx, pid: 1227264 Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: nginx, pid: 1227210 Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: sh, pid: 1227067 Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: memcached, pid: 1227086 Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: rsync, pid: 1227069 Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: supervisord, pid: 1226460 Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: crond, pid: 1227660 Warning SystemOOM 66m kubelet, 192.168.17.4 System OOM encountered, victim process: memcg_process_s, pid: 1299961 Warning SystemOOM 66m (x3 over 66m) kubelet, 192.168.17.4 (combined from similar events): System OOM encountered, victim process: memcg_process_s, pid: 1299953 Normal NodeHasSufficientMemory 66m (x2 over 66m) kubelet, 192.168.17.4 Node 192.168.17.4 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 66m (x2 over 66m) kubelet, 192.168.17.4 Node 192.168.17.4 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 66m (x2 over 66m) kubelet, 192.168.17.4 Node 192.168.17.4 status is now: NodeHasSufficientPID Normal Starting 66m kubelet, 192.168.17.4 Starting kubelet. Normal NodeAllocatableEnforced 66m kubelet, 192.168.17.4 Updated Node Allocatable limit across pods Normal NodeReady 66m kubelet, 192.168.17.4 Node 192.168.17.4 status is now: NodeReady Normal Starting 31m kube-proxy, 192.168.17.4 Starting kube-proxy. Normal Starting 11m kubelet, 192.168.17.4 Starting kubelet. Normal NodeHasSufficientMemory 11m (x8 over 11m) kubelet, 192.168.17.4 Node 192.168.17.4 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 11m (x8 over 11m) kubelet, 192.168.17.4 Node 192.168.17.4 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 11m (x7 over 11m) kubelet, 192.168.17.4 Node 192.168.17.4 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 11m kubelet, 192.168.17.4 Updated Node Allocatable limit across pods There's a plenty of memory in the node:
[cloudadmin@controller-2 ~]$ lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x000000007fffffff 2G online no 0 0x0000000100000000-0x0000002b7fffffff 170G online yes 2-86 0x0000002b80000000-0x0000002f7fffffff 16G online no 87-94 0x0000002f80000000-0x000000307fffffff 4G online yes 95-96 Memory block size: 2G Total online memory: 192G Total offline memory: 0B
At this stage, I suggest to remove the particular case which runs all LTP test suites. Can be added back later if considered reasonable - and after it has been verified as fully functional. |
| Comment by Juha Kosonen [ 28/Nov/19 ] |
|
Yes, let's wait for a while. |
| Comment by Cristina Pauna [ 28/Nov/19 ] |
|
Do you want to wait for this to be fixed before I make a new tag? The new images won’t be built until next week anyhow because of Thanksgiving in US. |
| Comment by Juha Kosonen [ 28/Nov/19 ] |
|
cristinapauna, FYI |