\chapter{Testing} \label{cha:testing} With the setup described in the prevoius chapter, we created a system which is able to capture and process biometric data. The system encapsulates this data into an attestation message and sends it to the PIA which is the DAA verifier in this configuration. We show in the following how well each part of the setup work and which performance the tested devices show. Furthermore we analyze the footprint in memory as well as on the disk. The tests are only applied to System 1 and 3 since System 2 has a comparable hardware configuration to System 3 but uses a CPU of an older generation. Furthermore, only two TPMs support the cryptographic parts required for ECDAA. We discuss this issue in further detail in \autoref{sec:limitations}. Consequently, System 2 was used as DAA verifier since this host does not require a TPM. \section{Trusted Boot} The first part of the setup is trusted boot which is well integrated in recent releases of kernel and GRUB bootloader. Furthermore, unlocking the optional disk encryption with the TPM works seamless with the kernel, even when using the manually generated unified kernel without GRUB. Only when performing an update for GRUB, it will modify the entries in the EFI boot manager. Consequently, we recommend a check of the boot entries after a system upgrade. Having a backup boot option with GRUB is useful for maintenance tasks. Especially for modifying the IMA configuration via kernel command line, it may be necessary to boot with a backup kernel. Hence, a backup boot process is strongly recommended for test setups. This requires a backup password for the disk encryption since which allows to bypass the TPM during booting. Otherwise there are no updates possible with the current setup since the affected PCRs are used by the EFI bootloader and cannot be precomputed. GRUB already supports trusted boot and activation requires a line in the corresponding config file. Unlike GRUB, the unified kernel does not perform any measurements. Only when asking the TPM for the disk encryption key, the initramfs must have the TPM stack available. These files use about 62\,KB of space in initramfs which is negligible compared with the complete image using about 80\,MB. Furthermore, only in this case exists a measurable difference in the boot performace since asking the TPM takes around 4\,s. This is the time when the boot process posts \texttt{unlocking via TPM} and stays there until the disk is unlocked. The used amount of memory was not tested here, but it should be comparable to the results when generating a TPM key for the member. According to these results, we assume that less than 10\,KB are used to hold the TPM stack in memory. \section{IMA} IMA appears to have an easy setup but a complex set of consequences. First, we show the impact on disk usage when IMA is enabled. In this case, the file hashes will be stored as extended file attributes. According to the xattr man page\cite{XattrMan2021}, the ext4 file system creates therefore a single block per file where all extended attributes must find place. The block size of the used filesystem is 4\,KB. The number of files is determined with the following command \begin{lstlisting}[numbers=none] find / -fstype ext4 -type f -uid 0 | wc -l \end{lstlisting} For example, system 1 holds after setup as Biometric Sensor and several system updates 156947 files. This results in estimated additional disk usage of 613\,MB when adding the \texttt{xattr} block for each file. Further attributes, however, would not require more disk space on ext4 since every extended attribute must exist in this single block. IMA holds all runtime information as virtual files in memory, including the integrity log. In context of memory usage, it is clear that the size of the virtual files without the redundant information (like PCR number and log format) is a rough estimation for the memory usage within the kernel since we did not analyze the corresponding data structires within the kernel. Both memory and disk usage do not change between IMA's fixing and enforcing mode since enforcing only adds file access control. Saving the file to disk, uses about 2\,MB of size per 10000 lines in the log. This indicates a linear dependency to the number of accessed files for the log and the kernel's memory usage. The log can easily get over 100000 entries when the system is running long enough. System 1, for example, had a log file with 214561 lines after about 15 days of uptime, resulting in about 40\,MB of size. The log file showed that a major impact on the size was the performed system update. Generating the iniramfs blob and compiling programs from source generates together several 10000 lines in the log. When looking into the performance of IMA, there is a huge drop when it is enable, especially when files are read the first time. We show in \autoref{tab:bootperformance} the reboot performance of the used test systems given a setup for a biometric sensor described in \autoref{cha:implementation} with TPM backed disk encryption enabled. \begin{table} \renewcommand{\arraystretch}{1.2} \centering \caption{Average rebooting times with IMA (n=10)} \label{tab:bootperformance} \begin{tabular}{r|ccc|ccc} \toprule {Reboot time [s]} &\multicolumn{3}{c|}{\textit{System 1}} &\multicolumn{3}{c}{\textit{System 3}} \\ &\rotatebox{90}{IMA off} &\rotatebox{90}{IMA fix} &\rotatebox{90}{IMA enf} &\rotatebox{90}{IMA off} &\rotatebox{90}{IMA fix} &\rotatebox{90}{IMA enf}\\ \midrule \textit{Shutdown} & 1.1& 21.0& 20.6& 3.3 &24.5 &24.5 \\ \textit{Boot to disk decryption} & 13.5& 14.1& 14.3& 14.0& 15.4& 15.4 \\ \textit{Decrypt disk} & 3.9& 4.2& 4.2&4.0 &4.2 &4.2 \\ \textit{Boot with disk} &10.2 &27.4 &27.0 &10.6 &31.5 &30.6 \\\hline \textit{Complete Reboot} &28.7 &66.7 &66.1 &31.8 &75.6 &74.6 \\ \bottomrule \end{tabular} \end{table} This is an example for a reproducible, file intensive process where all resources have to pass IMA. The procedure was captured via camera and split into 4 stages. \emph{Shutdown} represents the time from entering \texttt{reboot} into the shell until the monitor stops displaying content. Since all systems use the disk encryption feature supported with the TPM, we measured how log it takes to ask the TPM for the passphrase (\emph{Boot to disk encryption}) and how long the query itself is executed (\emph{Decrypt disk}). With boot disk and kernel available, the system starts all services until the login prompt on the CLI appears (\emph{Boot with disk}). This ends the reboot procedure. Although the number of runs is very small, the test clearly shows that enabling IMA doubles the time for rebooting. Furthermore, asking the TPM for the decryption key and decrypting the disk takes around 4\,s in any case. This may caused by the \texttt{tpm2\_unseal} operation which computes multiple relatively complex cryptographic instructions before the task is done. \section{Processing and Sending Biometric Data} Capturing and processing biometric data from the user is quite seamless and the cryptographic part of ECDAA is reliable enough for this prototype. During the following tests, all software and hardware parts worked as expected. Neither TPM nor software errors were encountered. Analyzing the occupied resources is only meaningful for the DAA member. The implemented protoype of the DAA issuer does only negotiate the membership key. Revocation lists and group management are not implemented yet, although the ECDAA library provides datastrucutres and functions for that. Similarly, the DAA verifier only checks the signature of the received message. In a production setup, both entities must hold the revocation list and perform further checks to trust the DAA member and its messages. We split the tasks of a Digidow sensor in several parts to document the conrtibution of each. \begin{itemize} \item \emph{DAA TPM key generation}: Clear the TPM, generate a new EK and DAA key pair and persist the DAA key in the TPM's NVRAM \item \emph{DAA TPM join w/o keygen}: Use the DAA key which is already in place and negotiate a group membership with the DAA issuer. \item \emph{DAA TPM keygen \& join}: This combines the two steps above to give comparable time measurements to the join procedure without TPM. \item \emph{DAA keygen \& join}: Generate the DAA keypair and save it to disk. Join the DAA group by negotiating a secret with the DAA issuer. \item \emph{Digidow sensor capture}: Create an image using \texttt{bs-capture} and save it to disk. \item \emph{Digidow sensor embed}: Extract a face embedding using the tensorflow application \texttt{img2emb}. \item \emph{Digidow sensor collect}: Collect the integrity log and save it to disk. Create a sha512sum of the file and put it together with all PCRs and the face embedding data into one message. Calculate another sha512sum from the message itself and save it to disk. \item \emph{Digidow sensor send}: Sign the message's hash with the TPM DAA key and send it together with the message to the DAA verifier. The verifier saves message and hash for further procedures on its disk. \end{itemize} \subsection{Disk Usage} In this early stage of the prototype, any statistics about how much disk space is required to run this setup is not useful. All programs besides the face embedding application are rather small, the binaries itself have a size ofless than 100\,kB. The installing process is still manual requiring a local build environment for C and Rust. Furthermore the programs require a list of dependencies which need to be installed with the package manager. Hence neither the size of the executables, nor the total disk occupation is informative for productive estimations. Simlarly, the face embedding application should be seen as example of a biometric sensor, making a detailed discussion about time and space efficiency less meaningful. \subsection{Memory Usage} First, we look into the memory footprint of each part by executing them via \texttt{/usr/bin/time}. It measures the the maximum resident size im memory during lifetime, which includes stack, heap and data section. \autoref{tab:memoryusage} show the maximum usage of each task during 10000 runs in the different IMa configurations \texttt{off}, \texttt{fix}, and \texttt{enforcing}. \begin{table} \renewcommand{\arraystretch}{1.2} \centering \caption{Maximum memory usage measured with \texttt{/usr/bin/time}} \label{tab:memoryusage} \begin{tabular}{lrr} \toprule \textit{Task [kB]} &\textit{System 1} &\textit{System 3} \\ \midrule {DAA keygen \& join} &2,000 &2,024 \\ {DAA TPM keygen \& join} &2,344 &2,324 \\ {Digidow sensor capture} &3,748 &3,688 \\ {Digidow sensor embed} &809,616 &847,712\\ {Digidow sensor collect} &5,128 &5,172 \\ {Digidow sensor send} &2,604 &2,628 \\ \bottomrule \end{tabular} \end{table} The memory allocation is constant for all parts in this table. Besides calculating the face embedding of the captured image, the whole transaction can be executed using few Megabytes of heap memory. This would fit on most embedded devices running a Linux kernel. However, the face embedding algorithm uses over 800\,MB and requres the majority of the computation time as shown below. \subsection{Memory Safety} \label{ssec:memory-safety} During these memory tests, valgrind showed a large number of possible memory leaks in the python binary itself. The following example is executed: \begin{lstlisting}[numbers=none] root@amd1:~/jetson-nano-facerecognition# valgrind python3 img2emb.py data/test-images/test2.jpg \end{lstlisting} Valgrind ends with the report as follows: \begin{lstlisting}[numbers=none] ==1648== HEAP SUMMARY: ==1648== in use at exit: 32,608,730 bytes in 227,287 blocks ==1648== total heap usage: 810,162 allocs, 582,875 frees, 1,385,416,573 bytes allocated ==1648== ==1648== LEAK SUMMARY: ==1648== definitely lost: 3,144 bytes in 28 blocks ==1648== indirectly lost: 0 bytes in 1 blocks ==1648== possibly lost: 523,629 bytes in 12,842 blocks ==1648== still reachable: 32,081,957 bytes in 214,416 blocks ==1648== of which reachable via heuristic: ==1648== stdstring : 537,414 bytes in 11,917 blocks ==1648== newarray : 8,920 bytes in 5 blocks ==1648== suppressed: 0 bytes in 0 blocks ==1648== Rerun with --leak-check=full to see details of leaked memory ==1648== ==1648== Use --track-origins=yes to see where uninitialised values come from ==1648== For lists of detected and suppressed errors, rerun with: -s ==1648== ERROR SUMMARY: 58173 errors from 914 contexts (suppressed: 0 from 0) \end{lstlisting} This report shows that the Python binary (here Python 3.8 from Ubuntu 20.04) is not memory safe, which is a significant drawback for the system and software integrity. Any binary which is directly involved in the DAA protocol frees every allocated block. Furthrmore any binary in the TPM2 software stack is memory safe according to valgrind. The used shell commands may not free every allocated block, however valgrind still finds no errors in these programs. \subsection{Performance} \autoref{tab:wholeperformance} shows the time consumption for each task with its minimum, average and maximum results over 10000 runs. \begin{table} \renewcommand{\arraystretch}{1.2} \centering \caption{Performance results of joining a DAA group and sending a Digidow transaction message (n=10000)} \label{tab:wholeperformance} \begin{tabular}{ll|ccc|ccc} \toprule \multicolumn{2}{l|}{\textit{Task [s]}} &\multicolumn{3}{c|}{\textit{System 1}} &\multicolumn{3}{c}{\textit{System 3}} \\ &&\rotatebox{90}{IMA off} &\rotatebox{90}{IMA fix} &\rotatebox{90}{IMA enf} &\rotatebox{90}{IMA off} &\rotatebox{90}{IMA fix} &\rotatebox{90}{IMA enf}\\ \midrule \textit{DAA keygen \& join} &min &0.03 &0.03 &0.03 &0.03 &0.03 &0.03 \\ &avg &0.05 &0.05 &0.05 &0.03 &0.03 &0.03 \\ &max &0.07 &0.06 &0.06 &0.06 &0.06 &0.28 \\ &first &0.04 &0.07 &0.07 &0.04 &0.13 &0.04 \\\hline \textit{DAA TPM keygen \& join} &min &0.33 &0.33 &0.33 &0.35 &0.35 &0.35 \\ &avg &0.34 &0.34 &0.34 &0.37 &0.37 &0.37 \\ &max &0.34 &0.36 &0.35 &0.37 &0.41 &0.40 \\ &first &0.37 &0.41 &0.41 &0.40 &0.42 &0.35 \\\hline\hline \textit{Digidow sensor capture} &min &0.92 &0.91 &0.92 &0.91 &0.91 &0.91 \\ &avg &1.07 &1.05 &1.05 &1.06 &1.06 &1.06 \\ &max &1.14 &1.14 &1.14 &1.12 &12.48 &1.12 \\ &first &1.36 &1.42 &1.47 &1.34 &1.46 &1.45 \\\hline \textit{Digidow sensor embed} &min &3.48 &3.51 &3.51 &4.07 &4.09 &4.10 \\ &avg &3.53 &3.53 &3.55 &4.12 &4.14 &4.14 \\ &max &4.11 &4.11 &4.09 &4.74 &4.46 &4.53 \\ &first &5.41 &19.93 &19.88 &5.99 &40.21 &40.23 \\\hline \textit{Digidow sensor collect} &min &0.07 &0.14 &0.14 &0.09 &0.19 &0.19 \\ &avg &0.08 &n/a &n/a &0.10 &n/a &n/a \\ &max &0.09 &n/a &n/a &0.11 &n/a &n/a \\ &first &0.09 &0.18 &0.22 &0.11 &0.24 &0.25 \\\hline \textit{Digidow sensor send} &min &0.25 &0.25 &0.25 &0.26 &0.27 &0.27 \\ &avg &0.25 &0.26 &0.26 &0.28 &0.27 &0.28 \\ &max &0.26 &0.27 &0.27 &0.28 &0.29 &0.29 \\ &first &0.26 &0.32 &0.32 &0.28 &0.40 &0.40 \\\hline\hline \textit{Digidow sensor transaction} &min &4.75 &4.84 &4.85 &5.38 &5.50 &5.49 \\ &avg &4.92 &n/a &n/a &5.56 &n/a &n/a \\ &max &5.52 &n/a &n/a &6.14 &n/a &n/a \\ &first &7.12 &21.92 &21.89 &7.72 &42.31 &42.33 \\ \bottomrule \end{tabular} \end{table} The \emph{first} run is stated separately, because it is done immediately after a system reboot where the resources cached by the kernel are not loaded yet. The major delay in the first run is caused by the face embedding program, especially when IMA is enabled. As stated in \autoref{ssub:integrity_log}, each resource has to be hashed and extended into PCR 10 before access is granted, making the first access significantly longer. Especially the tensorflow application requires significantly more time for the first run. With IMA set to enforcing, the kernel furthermore manages access to the file asked to read. This decision does not require additional resources compared to the fixing mode. The file must be hashed in any case. As long as the file \emph{integrity} is intact, PCR 10 and the integrity log file have to be written as well. Consequently, the difference between fixing and enforcing mode is to compare the computed filehash with the value in the extended attributes and the decision depending on that result. Since IMA measures every loaded resource, the corresponding log file will constantly increase during testing. Unfortunately the integrity log is required to collect the data for the Digidow attestation message. Furthermore, it is clear that every Digidow transaction contributes to the log since the handover beweent the single tasks is done based on files. Consequently, we expected a runtime dependent on the number of runs and the average and maximum runtime in \autoref{tab:wholeperformance} remains unavailable when IMA is enabled. The graphs of \autoref{fig:time-digidow-transaction} show the runtime of each of the runs on both tested systems and with IMA in fixing or enforcing mode respectively. \begin{figure} \centering \includegraphics[width=0.48\textwidth]{../resources/plots/amd1-ima-fix.png} \includegraphics[width=0.48\textwidth]{../resources/plots/intel2-ima-fix.png} \includegraphics[width=0.48\textwidth]{../resources/plots/amd1-ima-enf.png} \includegraphics[width=0.48\textwidth]{../resources/plots/intel2-ima-enf.png} \caption{Time consumption of a Digidow transaction on the tested systems} \label{fig:time-digidow-transaction} \end{figure} Each run is split into the four parts of a Digidow transaction. The graphs clearly show that our expectation of a linear relation between runtime and number of runs were not satisfied. It seems that collecting the integrity log has an estimated complexity of $O(n^2)$. Furthermore, it is interesting, that System 1 with the newer AMD processor seems to be faster in the beginning. When the number of runs reach 10000, the system need significantly more time than System 3 with the Intel processor. Since the software setup on both systems is comparable (Kernel version, Linux distro, installed programs, setup with respect to \autoref{cha:implementation}), the reason for this difference can probably be found either in the microarchitectural implementation or in (less) optimized code for the AMD CPU. When IMA is in fixing or enforcing mode, the corresponding log will be filled with information about every accessed file. The numbers in \autoref{tab:imalogentries} are taken from the integrity log after 10000 Digidow transaction tests. \begin{table} \renewcommand{\arraystretch}{1.2} \centering \caption{Number of additional entries in the integrity log while IMA is set to enforce} \label{tab:imalogentries} \begin{tabular}{lrr} \toprule &\textit{System 1} &\textit{System 3} \\ \midrule \textit{Root login after boot} &1912 &2159\\ \textit{Digidow sensor capture} &5 &5\\ \textit{Digidow sensor embed} &2561 &2162\\ \textit{Digidow sensor collect} &6 &6\\ \textit{Digidow sensor send} &12 &12\\ \textit{Every other Digidow transaction} &6 &6\\ \bottomrule \end{tabular} \end{table} IMA was set to enforcing and the DAA member key was already in the TPM. The root login happens when the \texttt{inputrc} file is loaeded. In the IMA file, this file appears roughly in line 2000. Unfortunately, neither the number nor the sequence of entries in the integrity log is predictable in this setup. The number depends on services and daemons started during the boot sequence. Not every service is consistently loaded every time and---depending on its individual state---the number of files loaded by the service may differ over several boot attempts. Predicting the sequence of entries in the log is currently not possible since the kernel is taking advantage of the multicore CPU by starting services and daemons in parallel when possible. In the example of \autoref{tab:imalogentries}, System 3 loaded parts of the python environment before root could login. These resources were partly used by the tensorflow application. Consequently, these two entries are very volatile and hence hard to predict. However, the contribution of capture, collect and send as well as every further Digidow transaction are consistent. The six entries per Digidow transaction are the changing working files for each run and one file in \texttt{/tmp}. Thus, after 10000 runs to start a Digidow transaction the integrity log ended up with about 65000 entries, taking the kernel already several seconds to simply output the file. In the current setup, the single reliable information out of the integrity log is when one of the programs were executed on the system, the corresponding entries must be somewhere in the log. Furthermore you have to uniquely identify one program or script by the single file defining it. \begin{table} \renewcommand{\arraystretch}{1.2} \centering \caption{Estimation of performance of creating an integrity log entry} \label{tab:imalogspeed} \begin{tabular}{lrrr} \toprule &{slowdown} &{log entries} &{per entry} \\ \midrule \textit{System 1} &14.87 &2,162 &6.88\,ms\\ \textit{System 3} &34.24 &2,561 &13.43\,ms\\ \bottomrule \end{tabular} \end{table} \section{Further Test Experiences} When IMA is set to enforcing, some unexpected problems appeared during updating Ubuntu. While executing \texttt{apt upgrade}, the package manager downloads the deb packages into its cache folder at \texttt{/var/cache/apt/}. These files, however, do not have the \texttt{security.ima} attribute when the download is finished. The kernel prevents any access due to this missing attribute and breaks the upgrade process. It is not clear why the files are not hashed although apt is run by root and every file created by root must be hashed according to the active IMA policy. Creating a text file via text editor shows exactly this behaviour. The missing attributes are also a problem when attemping to turn IMA off again. This requires an updated command line and thereafter an updated unified kernel. Generating the new kernel work fine but moving the blob into the EFI partition fails due to the missing support of extended file attributes. The copy process seems to create the file, but it fails when trying to write the first blocks. As a result, the only way to update the kernel on this system is to boot a backup kernel with IMA set to \texttt{fix} or \texttt{off} and moving the file to the EFI partition. Another relevant part for attestation is to use the integrity log file do recalculate the produced hashes. This means on one hand to recalculate the hash in the integrity log entry. On the other hand the chain of hashes should result into the value held in PCR 10. A small script shown in \autoref{code:verifyimash} tries to this calculation. \begin{lstlisting}[numbers=none,float, caption={Attempt to recalculate the value of PCR 10}, label={code:verifyimash}] #!/usr/bin/bash set -e tpm2_pcrreset 16 cut -d ' ' -f 2 /sys/kernel/security/ascii_runtime_measurements > ima.hashes while read i ; do tpm2_pcrextend 16:sha1=$i ; done < ima.hashes tpm2_pcrread sha1:10,16 \end{lstlisting} It uses the debug register PCR 16 which is resetable without reboot and has the same initial value as the first 10 PCRs. When IMA is off, the log holds only one entry of the boot aggregate. Then the SHA1 value can be computed with that script. However, comprehending the PCR 10 value with IMA enabled was not possible. Our tests took into account that PCR and log file could be modified when loading programs to read these resources the first time. Loading the log at least two times eventually ends up in a stable log and PCR value (it does not change anymore even when reading the log another time). The value of PCR 10 was still not reproducible. Furthermore the documentation of calculating these vaules did not mention how the sha256 hash in PCR 10 is calculated. \texttt{tpm2\_pcrextend} requires a sha256 hash as input for the corresponding PCR bank, but the integrity log only provides sha1 hashes. Hence, any verification procedures regarding the sha256 bank of PCR 10 are currently not implemented.