## Hardware Implementations

**A Custom Accelerator for Homomorphic Encryption Applications (PDF)**I work on a custom hardware implementation of a NTT based large polynomial multiplier optimized for a class of reconfigurable logic to bring NTRU based FHE schemes one step closer to deployment in real-life applications. I implement the design in a high-end FPGA device and connect it to PC via a fast PCIe to use the design as an accelerator for homomorphic applications. I achieve 125 times speedup compared to the software implementation for homomorphic AES implementation.

**Evaluating the Hardware Performance of a Million-bit Multiplier (PDF)**I design and implement a million-bit multiplier in hardware with a small-footprint architecture using HDL Verilog. The architecture is based on Schönhage-Strassen Algorithm and the Number Theoretical Transform (NTT). It contain an innovative cache architecture along with processing elements customized to match the computation and access patterns of the FFT-based recursive multiplication algorithm. The synthesis of the design resulted in 666 MHz frequency using 90nm TSMC library and it matches the performance of a high-end 3 Ghz Intel Xeon processor.

**Accelerating Fully Homomorphic Encryption in Hardware (PDF)**

I design a custom architecture for realizing the GentryHalevi(GH) fully homomorphic encryption (FHE) scheme. I present the first full realization of FHE in hardware. I use a million-bit multiplier, and design additional hardware to realize the primitive functions of the GH-FHE scheme, i.e. encryption, decryption and recryption. The design uses 30 million gates and achieves runtime of 18.1 msec, 16.1 msec, and 3.1 sec compared to 1.8 sec, 20 msec, and 32 sec; for encryption, decryption and recryption respectively.**Constructing Cluster of Simple FPGA boards for Cryptologic Computations****(ARC)**

I design an FPGA cluster infrastructure that can be utilized in implementing cryptanalytic attacks and accelerating cryptographic operations. The FPGA cluster is controlled by a server (PC) using a software interface through TCP/IP protocol which enabled to add any FPGA to the cluster by simply plugging it to the Internet. On the FPGA end, a soft-core processor Microblaze is used to handle TCP/IP protocols. Furthermore, I introduce a functionality called hardware switching which might used to fully utilize a functional hardware on the FPGA by swapping between the functional hardware and Microblaze dynamically. The project includes designing hardware/software co-design of embedded systems, hardware modules for softcore processors and bootloaders for starting up software applications on startup of the systems.**Design of a Mobile GPS Signal Source Locater using Antennas and DSPIC**

I was one of the designer in a two people graduation project group for the Mobile GPS Signal Source Locater project. We design and implement a circuit board consisting a microcontroller, two patch antennas and a LCD screen that shows the location of the GPS Signal Source on the screen. We provide a low cost, ease of use and mobile device for locating GPS signals.

## Software Implementations

**MMSAT: A Scheme for Multimessage Multiuser Signature Aggregation**(PDF)

We propose a new PQ-Aggregation Scheme which can support aggregation of multiple messages of multiple users. This provides a key optimization for blockchains to reduce the size of the signatures on the blocks. We provide a fast and flexible library that implements the scheme.**Flattening NTRU for Evaluation Key Free Homomorphic Encryption****(PDF)**

I propose a new FHE scheme F-NTRU that adopts the flattening technique proposed in GSW to derive an NTRU based scheme that (similar to GSW) does not require evaluation keys or key switching. I provide implementation results (using C++) which show fast evaluation times compared to existing schemes while eliminating the need for storing and managing costly evaluation keys.**Homomorphic Autocomplete****(PDF)**

I work to establish homomorphic autocomplete scheme that is communicational and computational efficient. I model the autocomplete problem in the client/server setting and later convert it to a homomorphic circuit. Also, I provide a C++ implementation that is based on LTV-FHE which achieves performance of less than a second.**Arithmetic Using Word-wise Homomorphic Encryption****(PDF)**

I work on the solutions for challenging problems in wordwise encrypted via a SWHE scheme including homomorphic division, and comparison computations. I introduce convergence based iterative division and comparison algorithms and an algebraic technique for zero check and thresholding.**On-the-fly Homomorphic Batching/Unbatching****(PDF)**

I introduce a homomorphic batching technique to tackle the problem of computing a NTT homomorphically over the domain defined by the message space. Empowered by homomorphic NTT, I define homomorphic batching/unbatching which allows us to move the coefficients of encrypted message polynomials into message slots and vice versa. I present concrete performance figures for all proposed NTT primitives.**Depth Optimized Efficient Homomorphic Sorting (Springer)**

I work on a sorting scheme to efficiently sort encrypted data using LTV-FHE scheme. First, I look at the existing classical sorting algorithms and deduce that they are not suitable for FHE implementation with high circuit depth requirements. I propose our sorting algorithm Direct Sort which has the smallest circuit depth. It achieve an an amortized run time of 6 minutes for 64 elements.**Homomorphic AES Evaluation Using the Modified LTV Scheme****(PDF)**

In this work, I develop a customize implementation of the LTV-FHE scheme which is optimized using NTT methods, precomputed tables and optimal parameter selections. I introduce a specialization of the ring structure that reduce public key size from 87 GB to 6.13 GB. Finally, I implement a FHE version of AES which I gained 5.8 times speedup compared to BGV based AES with a runtime of 50 seconds.**Toward Practical Homomorphic Evaluation of Block Ciphers Using Prince (PDF)**

I use the previously build LTV-FHE library to implement a lightweight block cipher PRINCE. The usage of a low level circuit depth design improved the performance significantly compared to AES-FHE implementation. A PRINCE block took 3.3 seconds of evaluation time compared to 50 seconds in AES-FHE.**Bandwidth Efficient PIR from NTRU****(PDF)**

I developed a private information retrieval (PIR) scheme by customizing the LTV scheme into a NTRU based somewhat homomorphic encryption (SWHE) scheme. My PIR scheme is able to evaluate a depth 5 circuit which might retrieve data from a database containing 4 billion rows. Although our scheme is not as efficient as classical PIR schemes, it achieves lower bandwidth costs, e.g. 1000 times smaller in large databases.**Multi-core Implementation of the Tate Pairing using Cell Blade**

I implement a Tate Pairing algorithm on an IBM processor Cell Blade using C/C++. The design is optimized and parallelized by grasping the multi-core functionality of the Cell Blade processor. The parallelization is not limited to cores, I also implement the operations using Single Instruction, Multiple Data (SIMD) instructions and achieve a higher throughput for the Tate Pairing operation.

## Source Code

You can access the source code of the projects from the github link, here.