[IEC-26] [IEC][SEBA][PONSim] cord-tester setup_venv fails: pynacl pwhash_scrypt out of memory Created: 25/Oct/19  Updated: 30/Oct/19  Resolved: 30/Oct/19

Status: Done
Project: Integrated Edge Cloud
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Ciprian Barbu Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

This problem happens on aarch64 when trying to prepare the env for cord-tester testing framework. Because the pynacl python package is not compiled for aarch64, the setup_venv.sh will try to build it and in the process also runs the tests (i.e. calls make check). One of the 72 tests (pwhash_scrypt) fails after timing out and what seems to be an out of memory error.

The same problem has been reported last year here:
https://github.com/jedisct1/libsodium/issues/721

One of the solutions suggested is to increase the memlock ulimit, which for some reason is very low by default on aarch64.

I have tried setting up this ulimit in the container we use for testing and it works when running on the jenkins slave (baremetal). However the test still crashes when running the framework from the same cord-tester image but on the master node of an aarch64 virtual pod.

 



 Comments   
Comment by Ciprian Barbu [ 29/Oct/19 ]

My bug was closed on the libsodium github repo, but I replied to this thread on pynacl:

https://github.com/pyca/pynacl/issues/486

Comment by Ciprian Barbu [ 29/Oct/19 ]

I have opened a new issue for the github repository:

https://github.com/jedisct1/libsodium/issues/890

Comment by Ciprian Barbu [ 29/Oct/19 ]

So it looks like there are two problems here. Most of the times (if not always) the build of pynacl will fail during the tests because the ulimit for max memory lock is very low. This is also described in the bug report mentioned in the description.

The second problem was seen when running the build on the IEC master node. Here the test would suddenly start taking up more and more memory, exhausting all of the available RAM and then going on to consume the swap memory.

I eventually started debugging this test case and found there is a negative test in pwhash_scrypt [1] which by the looks of it specifies an invalid encoded string which probably is supposed to cause a failure down the line when allocating memory [2] for the decryption process. I noticed that the computed required memory goes in the range of TB, which is really not supposed to work.

Searching for references on mmap not failing when large amounts of memory are requests, I ended up on this thread [3], called:
In a 64 bit process, will my mmap / malloc request ever be denied?

One of the answers in this thread talks about memory overcommit limit, which can be set via sysctl vm.memory_overcommit. This parameter controls the amount of memory overcommit checks that the kernel performs during syscalls like mmap and related. On our IEC K8S master, this was set to 1, which means no overcommit handling is performed.

Additionally, the way libsodium implements scrypt alloc_region function, it forces it to also populate the pages, which is useful under normal circumstances, but in this case, combined with the value of vm.memory_overcommit, caused the OS to exhaust all the memory trying to perform the mmap request.

So the solution is to set the vm.memory_overcommit value to something that allows extra memory checks, values of 0 or 2 both work fine.

But in my opinion the libsodium implementation of maybe the test case is not very well thought, since this is one valid example of configuring the OS, to allow better control for memory in virtualized environments. One could also consider this a security threat.

[1] https://github.com/jedisct1/libsodium/blob/1.0.16/test/default/pwhash_scrypt.c#L269

[2] https://github.com/jedisct1/libsodium/blob/1.0.16/src/libsodium/crypto_pwhash/scryptsalsa208sha256/scrypt_platform.c#L45

[3] https://stackoverflow.com/questions/4923116/in-a-64-bit-process-will-my-mmap-malloc-request-ever-be-denied

Generated at Sat Feb 10 06:02:14 UTC 2024 using Jira 9.4.5#940005-sha1:e3094934eac4fd8653cf39da58f39364fb9cc7c1.