References

bsuir

Доклады БГУИР

Doklady BGUIR

1729-76482708-0382

БГУИР

10.35596/1729-7648-2021-19-5-86-93

bsuir-3141

Research Article

ЭЛЕКТРОНИКА, РАДИОФИЗИКА, РАДИОТЕХНИКА, ИНФОРМАТИКА

ELECTRONICS, RADIOPHYSICS, RADIOENGINEERING, INFORMATICS

Архитектура процессора вычисления дискретного косинусного преобразования для систем сжатия изображения по схеме losless-to-lossy

Architecture of the discrete sosine transformation processor for image compression systems on the losless-to-lossy circuit

Ключеня

В. В.

Kliuchenia

V. V.

Ключеня Виталий Васильевич, кандидат технических наук, доцент кафедры электронных вычислительных средств

220013, г. Минск, ул. П. Бровки, 6

Kliuchenia Vitaly V., PhD, Associate Professor at the Electronic Computing Department

220013, Minsk, P. Brovka str., 6

vitaly.kliuchenia@gmail.com

Белорусский государственный университет информатики и радиоэлектроникиBelarusian State University of Informatics and Radioelectronics

2021

26082021

1958693

2021

Ключеня В.В.

Kliuchenia V.V.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://doklady.bsuir.by/jour/article/view/3141

Аппаратные реализации блоков дискретного косинусного преобразования (ДКП) на арифметике с фиксированной запятой, известные как IntDCT [1] и BinDCT [2], требуют решения некоторых вопросов. Один из главных вопросов – выбор между реализацией преобразования на ПЛИС или реализацией на цифровом сигнальном процессоре (Digital Signal Processor, DSP). Каждая из реализаций имеет как свои плюсы, так и минусы. Одним из самых главных достоинств реализации на DSP является наличие специальных инструкций, используемых в DSP, в частности, возможность перемножения двух чисел за один такт. Поэтому с появлением DSP было снято ограничение на количество умножений в алгоритмах. С другой стороны, при реализации блока на ПЛИС можно не ограничивать себя разрядностью данных (в разумных пределах), имеется возможность параллельной обработки всех поступающих данных и реализации специализированных вычислительных ядер для различных задач. По сути, проектирование систем мультимедиа на ПЛИС напоминает проектирование схожих систем на логике малой и средней степени интеграции. Такая реализация имеет те же ограничения: относительно малое количество доступной памяти, необходимость проектировать базовые элементы конструкции (умножители, делители) и т. д. Именно неравнозначность операций сложения и умножения при реализации их на ПЛИС и обусловила поиски алгоритмов ДКП с наименьшим числом множителей. Однако даже этого недостаточно, поскольку структура умножителя во много раз сложнее структуры сумматора, что заставило искать способы преобразования без использования умножений вообще. В статье показано, как на основе целочисленного прямого и обратного ДКП и распределенной арифметики создать новую универсальную архитектуру декоррелирующего преобразования на ПЛИС типа FPGA без операций умножения для систем трансформационного кодирования изображений, которые работают по принципу lossless-to-lossy (L2L), и получить лучшие экспериментальные результаты по аппаратным ресурсам по сравнению с аналогичными системами сжатия.

The hardware implementations of fixed-point DCT blocks, known as IntDCT [1] and BinDCT [2], require some solutions. One of the main issues is the choice between the implementation of the conversion on FPGA, or the implementation on a digital signal processor (Digital Signal Processor, DSP). Each of the implementations has its own pros and cons. One of the most important advantages of the DSP implementation is the presence of special instructions used in DSP, in particular, the ability to multiply two numbers in one clock cycle. Therefore, with the advent of DSP, the limitation on the number of multiplications in algorithms was removed. On the other hand, when implementing a block on an FPGA, we can limit not ourselves to the bitness of the data (within reasonable limits), we have the ability to parallelize all incoming data and implement specialized computing cores for various tasks. In fact, designing multimedia systems on FPGAs reminds the design of similar systems based on the logic of a small and medium degree of integration. Such an implementation has the same limitations: a relatively small amount of available memory, the need to design basic structural elements (multipliers, divisors), etc. It is the inequality of the addition and multiplication operations when they are implemented on FPGAs that caused the search for DCT algorithms with the smallest number of factors. However, even this is not enough, since the structure of the multiplier is many times more complex than the structure of the adder, which made it necessary to look for ways to transform without using multiplications at all. This article shows how, on the basis of integer direct and inverse DCT and distributed arithmetic, to create a new universal architecture of decorrelated transform on FPGAs without multiplication operations for image transformation coding systems that operate on the principle of lossless-to-lossy (L2L), and to obtain the best experimental results in terms of hardware resources compared to comparable compression systems.

ДКПдискретное косинусное преобразованиеL2Llossless-to-lossyархитектураFPGA (Field-Programmable Gate Array)блочная лестничная структурная параметризацияБЛСП

DCTdiscrete cosine transformL2Llossless-to-lossyarchitectureFPGA (Field-Programmable Gate Array)block staircase structural parameterizationBLSP

References1

Suzuki T. Integer DCT Based on Direct-Lifting of DCT-IDCT for Lossless-to-Lossy Image Coding. IEEE Transactions on image processing. November 2010;19(11):2958-2965.

Dang P.P. BinDCT and Its Efficient VLSI Architectures for Real-Time Embedded Applications. Journal of imaging science and technology. March/April 2005;49(2):124-137.

Suzuki T. Integer fast lapped transforms based on direct-lifting of DCTs for lossy-to-lossless image coding. EURASIP Journal on Image and Video Processing. 2013;1:1-9.

Suzuki T. Realization of lossless-to-lossy ima1ge coding compatible with JPEG standard by direct-lifting of DCT-IDCT. Proceedings of the 17th IEEE Intern. Conf. on Image Processing (ICIP’2010). Hong Kong. 26–29 Sept.; 2010: 389-392.

Chen Y.H. A High-Throughput and Area-Efficient Video Transform Core With a Time Division Strategy. IEEE Trans. VLSI Syst. 2014;22(11):2268-2277.

White S.A. Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE ASSP Magazine. 1989;6(3):4-19.

Chen Y.H. High throughput DA-based DCT with high accuracy error-compensated adder tree. IEEE Trans. VLSI Syst. Apr. 2011;19(4):709-714.

Low-power and high-quality Cordic-based Loeffler DCT for signal processing. IET Circuit, Devices & System. December 2007;1:453-461.

Tumeo A. A pipelined fast 2D-DCT accelerator for FPGA-based SoCs. In Proc. IEEE Comput. SoC. Annu. Symp.VLSI. 2007: 331-336.

The authors declare that there are no conflicts of interest present.