[IEC-36] [IEC][SEBA] Investigate using Akraino UNH ThunderX2 Pod 2 for SEBA validation Created: 30/Mar/20  Updated: 17/Jun/20  Resolved: 17/Jun/20

Status: Done
Project: Integrated Edge Cloud
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Medium
Reporter: Ciprian Barbu Assignee: Ciprian Barbu
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

SEBA validation should best be run on baremetal, when it comes to arm64 deployments. We have used some older ThunderX v1 PODs internally in ENEA, but we also tested on one baremetal ThunderX2 POD which shows quite an improvement.

Given that there is a community lab in Akraino UNH which is not used by other projects anymore, we could very well make use of it and eventually integrate it into the Akraino CI.

There might be a problem though, as there are specific network configurations needed to deploy IEC. SEBA itself does not have specific requirements.

There is some networking info on the LAB page: https://wiki.akraino.org/display/AK/ThunderX2+Pod+2

Also there are requirements for IEC networking here: https://wiki.akraino.org/display/AK/IEC+Validation+Lab

This shows a minimum requirement of 3 independent networks, where at least one must be on a native VLAN for PXE booting.

We need to check the UNH POD meets the requirements and if needed make the necessary changes with the help of LF IT.



 Comments   
Comment by Ciprian Barbu [ 17/Jun/20 ]

After various problems, including gigabyte6 also behaving bad like gigabyte4, unable to boot due to empty flash memory, I managed to deploy IEC Type 2 on the POD.

The network configuring in the edgecore switch has been altered, so that the servers can PXE boot on one of the internal interfaces. This is because the BIOS for Gigabyte ThunderX2 R281-T91 does not support booting on anything else, even with the latest firmware.

I also upgraded the BIOS to latest available (F29) on all 3 servers, and the BMC Firmware (Tencent 1.98).

Next step is to add the ThunderX2-pod2 into CI.

Comment by Ciprian Barbu [ 11/Jun/20 ]

Having the jumphost ready, I created the necessary json configuration files for Fuel deployment. At this point I started looking how to configure the network, and figured out we need some more VLANs configured on the switch.

The IT lab guys instructed me to configure the switch myself, but there were a number of issues, one of them being that breakout cables didn't work, which was necessary because in turn the ThunderX2 server would only PXE boot from the internal interfaces, which are connected via breakout cables.

Fiddling around with the VLANs, one of the servers, gigabyte4 stopped working. Looking at the serial console, the server had an empty SPI NOR memory, where the first stage bootloader probably is. I asked help from some people at Gigabyte, who told me to update the FW and disconnect the power for a while on the server.

This worked at the time, but then I started trying out deployment with Fuel and the the server ran into the same problem. I will need to verify a few things to see what could cause this, but I'm afraid the server might need replacement. I'm also having issues with the installer, for some reason MaaS is not able to commission/provision the servers, they boot fine in PXE mode, but shortly after cloud-init starts running and installing a few packages it reboots the servers, and this happens in an infinite loop. I will need some help figuring out this issue, I'm not at all familiar with MaaS. 

Comment by Ciprian Barbu [ 02/Jun/20 ]

Update:

After a lot of back and forth and syncing with the IT team, we now have the jumphost in ThunderX2-pod2 (which is designated as ampere-junmphost2) configured in order to start deploying IEC on the POD.

At this point I need to create corresponding idf and pod files and start trying out deployments.

Comment by Ciprian Barbu [ 02/Jun/20 ]

Update:

After a lot of back and forth and syncing with the IT team, we now have the jumphost in ThunderX2-pod2 (which is designated as ampere-junmphost2) configured in order to start deploying IEC on the POD.

At this point I need to create corresponding idf and pod files and start trying out deployments.

Comment by Ciprian Barbu [ 15/May/20 ]

Update:

Since the ThunderX2-POD1 has no jumpserver anymore, we need to switch over to ThunderX2-POD2. We have already made an arrangement with the ELIOT Team to switch between these two, and reserved the POD int the Community Lab Calendar.

Comment by Ciprian Barbu [ 30/Mar/20 ]

Some more useful information, the fuel installer used to deploy IEC needs two yaml files which describe a particular POD. See these examples on the kind of information needed:

https://github.com/ciprian-barbu/pharos/tree/akraino-iec

Generated at Sat Feb 10 06:02:19 UTC 2024 using Jira 9.4.5#940005-sha1:e3094934eac4fd8653cf39da58f39364fb9cc7c1.