Embedded engineers at MetroScientific build efficient and secure embedded systems including wearables, Wi-Fi routers, sensors, home appliances, automotive solutions, and more. Our extended teams deliver full-cycle development services and build embedded solutions of any complexity.
MetroScientific expertise covers hardware and firmware development for multiple embedded projects. Our engineers create reliable software solutions that fit complex technical requirements of top-notch hardware manufacturers. We ensure embedded software and hardware components successfully handle multiple internal and external factors affecting their performance.
Different experts at MetroScientific join forces to deliver unique IoT solutions. We accumulated massive experience with cloud computing and big data platforms. MetroScientific software architects fuse this knowledge with our full-cycle development capabilities in embedded software to deliver custom IoT solutions to global clients.
Our software engineers will help you navigate the magnitude of IoT-related technologies to select the ones that fit your project’s requirements. We successfully integrate hardware, firmware, and data management components. Our dedicated development teams design secure and efficient embedded systems as well as supporting software infrastructures.
SmartSSD Computational Storage Drive
Computational storage drives (CSDs) process data within the device, reducing the need for data movement. Samsung’s SmartSSD is a high-performance CSD with FPGA acceleration. Existing studies focus on reducing data movement but overlook implementation techniques for optimizing CSD performance. A case study using SmartSSD demonstrated up to 20X improvement in data verification time with the right combination of parallel processing techniques. Understanding and applying parallel processing techniques are crucial for effective CSD utilization, impacting performance (up to 59% variation) and resource utilization (up to 20% difference).
The hardware architecture of Samsung’s SmartSSD platform is depicted in Figure 1. The CSD includes a Xilinx FPGA, 8 GB DRAM, and 1 TB NAND flash memory. The FPGA’s DRAM acts as a buffer for in-storage data processing, such as compression. SmartSSD can function as a standard SSD when no in-storage processing is needed. Data transfer between the host memory, FPGA, and NAND flash is facilitated by a PCle switch. Two modes are available: normal mode for host memory-NAND flash transfer and peer-to-peer (P2P) mode for NAND flash-FPGA DRAM transfer. The Xilinx OpenCL framework controls in-storage processing and data manipulation in P2P mode.
Figure 1. Samsung SmartSSD Hardware Architecture
Techniques for Concurrent Processing within CSD
In this study, several techniques were used in CSDs for concurrent data processing:
1. Array Partitioning: It divides large arrays into smaller arrays or individual elements to improve data access times. SmartSSD supports block partitioning, cyclic partitioning and complete partitioning.
2. Loop Unrolling: It reduces the number of loop iterations, enabling parallel execution of independent operations. However, it increases code size and required registers.
3. Multiple Computing Units: Multiple instances of a computing unit can be created at compile-time in SmartSSD. The number of instances can be specified, but the maximum is determined by available resources and efficient utilization of those resources.
Figure 2. (a) Single page data verification time,
Figure 2. (b) Data verification time comparison between combinations of parallel processing techniques
The experiments were performed on an Intel Xeon machine with 16 cores and 160 GB memory. A SmartSSD with 4 GB FPGA DRAM and 3.84 TB NAND flash memory was used. The data verification algorithm was executed within the SmartSSD. Different combinations of parallel processing techniques were tested, comparing implementations with and without parallel processing. The results were obtained from 5 independent runs with standard deviations below 6.6% of the mean.
Single-Page Data Verification: In the comparison of parallel processing techniques, sequential implementation performed best with a degree of parallelism of one. The longer execution times of parallel techniques were attributed to initialization overhead such as data copying and loading time. For small data sizes, sequential processing was found to be more efficient than array partitioning or multiple computing units. However, within a single computing unit (figure 2(a)), applying array partitioning could improve performance compared to using multiple computing units for small data sizes.
Multiple-Page Data Verification: Figure 2(b) illustrates the impact of different combinations of parallel processing techniques on the data verification time for a single 1 GB file. The results show that the choice of techniques can reduce the overall verification time by up to 59%. When array partitioning has high parallelism but a low number of computing units, the verification time tends to be longer. This suggests that utilizing multiple computing units is more efficient than increasing parallelism within a single unit. As the data size increases, the initialization overhead observed in single-page verification becomes less significant due to the advantages of parallel processing.
In summary, the Samsung SmartSSD CSD is an adaptable computational storage platform that empowers software developers to easily build innovative hardware-accelerated solutions in familiar high-level languages. It provides massive performance gains and dense, linear scalability by pushing compute to where the data lives, making it an ideal solution for data-intensive applications in a rapidly growing global datasphere.
Firmware Development
Big Data Analysis uses for complex machine learning algorithms and ensemble algorithms. With large data sets (several terabytes to petabyte), it can do calculations in reasonable wall-clock time. However, often parallel processing techniques or cluster computing is required. Programs and languages like MatLab, R & Python use C/C++ 'under the hood' for some of their computations. Also, many bioinformatics programs like using C++ or C.
C++ is not used for writing interface-based or surface codes for system training applications.It is used for building underlying algorithms and libraries, which are eventually called by the main learning framework before execution. In other words, the entire code, even though not written in C++, is dependent on libraries fabricated with C++.It is also used to write libraries and functions having code-specific utilities, which may not be available in a particular pre-existing package.