\documentclass[%showframe% ]{oclh_doc} \usepackage{fix-cm}% fix of Computer Modern font DO NOT MOVE!!! \usepackage{calc}% simple arithmetic in expressions \usepackage{etoolbox}% some Latex interfaces \usepackage{longtable,multirow}% tables \usepackage{adjustbox,% placeins, % processing of float objects caption} % % % page geometry % \usepackage[ top = 0.06734007\paperheight, bottom = 0.06734007\paperheight, right = 0.1190476\paperwidth, left = 0.0952381\paperwidth ]{geometry} % % language and fonts % \usepackage{amsmath,amsfonts,amssymb,xfrac} \usepackage[cmintegrals,cmbraces]{newtxmath} \usepackage{xltxtra,polyglossia,csquotes} \usepackage{verbatim,fancyvrb,framed} \usepackage{relsize} \setmainlanguage{english} \setotherlanguage{russian} \setkeys{russian}{babelshorthands=true} \defaultfontfeatures{Scale=MatchLowercase,Mapping=tex-text} \input{fonts/font_settings-IBM_Plex.tex} % \input{fonts/font_settings-Computer_Modern.tex} % \input{fonts/font_settings-my_fonts.tex} % % paragraphs and align % \usepackage{indentfirst} \frenchspacing\sloppy\raggedbottom \setlength{\parindent}{0.0604762\paperwidth}% %%%%% additional colontitles settings \RequirePackage{fancyhdr} \fancyhf{}\renewcommand{\headrulewidth}{0pt} \fancyfoot{} \fancyfoot[R]{\thepage} \pagestyle{fancy} % % index % \usepackage[xindy]{imakeidx} \makeindex[options = -C utf8 -M texindy -M index_style_and_order ] % % references % \RequirePackage{hyperref} \RequirePackage{xcolor} \definecolor{DarkBlue}{rgb}{0,0,0.5} \hypersetup{colorlinks=true, linkcolor={black}, urlcolor =DarkBlue, citecolor={black}} % % lists % \usepackage{enumitem} \newenvironment{CodePar}% {\Verbatim[samepage=true,frame=single]}% {\endVerbatim}% % \newenvironment{CodeParWithCC}[1] {\Verbatim[samepage=true,frame=single,commandchars=#1]} {\endVerbatim} % \newenvironment{ImpNote} {\setlength{\parindent}{0.0604762\paperwidth}% \setlength{\LTleft}{\parindent}% \setlength{\LTpre}{\parsep}\setlength{\LTpost}{\parsep}% \begin{longtable}{|p{\linewidth-\tabcolsep-1\parindent}} \noindent\textsf{Important warning:}} {\end{longtable}} % \newcommand{\SetLenVarWithWidth}[2]{% \ifdefined #1 \else% \newlength{#1}% \fi% \settowidth{#1}{#2}% }% \newcommand{\NmCnvDescript}% {\addtolength{\leftskip}{0.0604762\paperwidth}% \setlength{\parindent}{-0.0604762\paperwidth}}% % \newcommand{\verbI}[1]{\textit{\Verb|#1|}}% \newcommand{\verbIU}[1]{\underline{\smash{\textit{\Verb|#1|}}}} \title{OpenCL\_helpers library} \author{hk@r4in.tk\\ mns@r4in.tk} \makeindex \begin{document} \maketitle \section{Introduction} The OpenCL\_helpers library is designed to simplify the programming of multithreaded applications using GPGPU (General-purpose computing on graphics processing units). The library does not cover all of needs of the programming of applications using GPGPU, but it had been written to simplify the making of applications using multilevel (hierarchical) parallelism on one computer. In other words, it allows to parallelise the task into threads on a main computer (CPU) and than parallelise the task inside each GPGPU-device, dividing GPU-threads into squads performing different subtasks. While designing the library it was assumed that each CPU-thread uses its own separate GPGPU-device, but it is not prohibited to use the same GPGPU-device in two or more CPU-threads. Parallelism at the CPU level is provided by POSIX Threads and parallelism at the GPGPU level is provided by the library. The library consists of four parts, not all of which are directly related to parallelism and which could be used independently of each other. The first part is OpenCL programs build tools~(s.\ref{sec:buildutils}). Build tools allow to build OpenCL programs from the command line and view the build result without writing and executing another application. The second part~(s.\ref{sec:libraryusing}) is the CPU-functions of the OpenCL\_helpers library and syntax which allows to make header-files common for CPU and GPGPU programs. The third part~(s.\ref{sec:memalloc}) makes up for the lack of memory management tools in OpenCL C and provides instruments for GPGPU memory allocation and deallocation, heap diagnostics and pointer reinterpretation. The fourth part is the GPGPU-functions which allow to organize parallelism inside GPGPU, dividing whole amount of GPGPU-threads into squads engaged in performing their own tasks. \subsection{Build and install} \label{sec:build_n_install} \subsubsection{Prerequisites} \label{subsec:prerequisites} It's supposed that you have: \begin{enumerate}[leftmargin=2\parindent] \item A computer with OS Linux installed. \item An installed and ready to work C compiler. \item The installed and available standard C language library (libc). \item An installed OpenCL software of version 1.2 or later from any vendor. \end{enumerate} \subsubsection{Getting the source code of the OpenCL\_helpers library} \label{subsec:getting_source} A copy of the source code of the OpenCL\_helpers library can be downloaded from the address \href{https://ggs.void.r4in.tk/hk/OpenCL_helpers/archive/master.tar.gz}% {\Verb|https://ggs.void.r4in.tk/hk/OpenCL\_helpers/archive/master.tar.gz|} than unpacked into a suitable directory. Besides of that, if VCS Git~(\href{https://git-scm.com}% {\Verb|https://git-scm.com|}) is installed, a copy of the source code of the library could be obtained with the following command: \par \indent\indent\verb|git clone |% \href{https://ggs.void.r4in.tk/hk/OpenCL_helpers.git}% {\Verb|https://ggs.void.r4in.tk/hk/OpenCL\_helpers.git|} \subsubsection{Build} To build with default settings it's necessary to navigate to the \verb|OpenCL_helpers| directory and run the command\par \indent\indent\verb|make|\par \noindent% It is acceptable to use option \verb|-j| for multithreaded build. If the \verb|make| command completed without errors, the following files would appear in the \verb|OpenCL_helpers/build| directory:\par \indent\indent\verb|liboclh.so.|\verbI{I}\verb|.|\verbI{J}% \verb| oclh_br oclh_cr oclh_lr|\par \noindent% and a few \verb|*.o| subdirectories containing object files. In the name of the first file \verbI{I} is the major version of the library and \verbI{J} is the minor version. Each file could be built separately with commands:\par \indent\indent\verb|make oclh_library|\par \indent\indent\verb|make oclh_builder|\par \indent\indent\verb|make oclh_compiler|\par \indent\indent\verb|make oclh_linker|\par If it was necessary, the library could be built for debugging with the command: \par \indent\indent\verb|make debug| \subsubsection{Installation} Installation is performed by the command:\par \indent\indent\verb|make install|\par \noindent% As the result the \verb|~/opt/oclh| directory is created, where executable files, the library file and header files are copied into the \verb|bin|, \verb|lib|, \verb|include| subdirectories respectively. After that it is advisable to add the \verb|~/opt/oclh/bin| directory to the \verb|PATH| environment variable and the \verb|~/opt/oclh/lib| directory to the \verb|LD_LIBRARY_PATH| environment variable. The destination path can be changed with the command:\par \indent\indent\verb|make PRFX_PATH=|\verbI{destination\_path}\verb| install| \subsubsection{Uninstallation} Uninstallation is performed by the command:\par \indent\indent\verb|make uninstall|\par \noindent or\par \indent\indent\verb|make PRFX_PATH=|\verbI{destination\_path}\verb| uninstall| \par \noindent% if the library had been installed in a non-default directory. \subsubsection{Documentation} The documentation of the OpenCL\_helpers library is built separately. To build the documentation, it's necessary to have the \XeTeX /\XeLaTeX\ typesetting system or another \TeX /\LaTeX-compatible system. The \XeTeX\ system and related packages are provided within the \TeX ~Live~distribution~(\href{https://www.tug.org/texlive/}% {\Verb|https://www.tug.org/texlive/|}). Using a system other than \XeTeX\ may require changes in the source code of the documentation. In addition to the typesetting system itself, it's necessary to have a number of packages, for example, xindy for composition of the index. All packages used for preparation of the documentation are freely available as part of the \TeX ~Live~distribution. The documentation build itself is performed in the \verb|OpenCL_helpers/documentation| directory by running build script\par \indent\indent\verb|./build_script|\par \noindent% If no errors occurred during the execution of the current script, the following files would appear in the \verb|OpenCL_helpers/documentation/build| directory: \par \indent\indent\verb|opencl_helpers_documentation-russian.pdf|\par \indent\indent\verb|opencl_helpers_documentation-english.pdf|\par \noindent% which contain the documentation in Russian and English languages, respectively. The build uses fonts of the IBM~Plex family, but it is possible to return to the basic Computer~Modern family by uncommenting the line\par \indent\indent\verb|\input{fonts/font_settings-Computer_Modern.tex}|\par \noindent in the preamble of the documentation source code. \subsection{Log file format} \label{subsec:logformat} \index{log file format}% The library tools allow maintain a log in log files about events occurring in an application, in addition, the library itself, if necessary, writes to the log file. Description of logging functions is given in~s.\ref{subsec:logfunctions}. The standard log file entry looks like\par \begin{CodeParWithCC}{\\\{\}} \textit{YYYY}-\textit{MM}-\textit{DD} \textit{hh}:\textit{mm}:\textit{ss} ws_0x\textit{HHHH} \textit{entry\_content} \end{CodeParWithCC} \noindent% where {% \setlength{\leftskip}{0pt}% \setlength{\LTpre}{\smallskipamount}\setlength{\LTpost}{\smallskipamount}% \setlength{\LTleft}{2\parindent-\tabcolsep} \SetLenVarWithWidth{\Acol}{\verbI{YYYY}}% \SetLenVarWithWidth{\Bcol}{--}% \begin{longtable}% {p{\Acol}p{\Bcol}p{\linewidth-\LTleft-\Acol-\Bcol-5\tabcolsep}} \verbI{YYYY}&--&year written in four decimal digits;\\ \verbI{MM}&--&month of the year written in two decimal digits from 01 to 12;\\ \verbI{DD}&--&day of the month written in two decimal digits from 01 to 31;\\ \verbI{hh}&--&hour of the day written in two decimal digits from 00 to 23;\\ \verbI{mm}&--&minute of the hour written in two decimal digits from 00 to 59;\\ \verbI{ss}&--&second of the minute written in two decimal digits from 00 to 59; \\ \verbI{HHHH}&--&the last two bytes of an address of the working configuration of the GPGPU device (workset, for details see s.\ref{subsec:structures}) written in four hexadecimal digits. \end{longtable} } \noindent% \verbI{entry\_content}~--~can be any text which passed to a logging function, but the library itself obeys, if possible, the next conventions: \begin{enumerate} \item Information related to OpenCL instances is recorded as \verbI{instance\_type}\verb|_0x|\verbI{HHHH}, where \verbI{HHHH}~--~the last two bytes of the instance address, written in four hexadecimal digits. So, for example, a GPGPU device could be recorded as \verb|dev_0x2a78|, and a platform as \verb|platform_0xf190|. An exhaustive list of OpenCL instances is given in the OpenCL specifications. \item As a delimiter of information blocks in the entries and marking the relativity of such blocks, the symbol <<\verb+|+>> is used. So, the entry\nopagebreak \begin{CodePar} 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | dev_0xf260 | ... \end{CodePar} means that entry describes event related to OpenCL context \verb|0x9f60| using GPGPU device \verb|0xf260|. \item In case of recording information that is an explicitation, an additional space is put before it, for example:\nopagebreak {\scriptsize \begin{CodePar} 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | Reference count: 1 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | Number of devices: 1 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | Device ID(s): 0x1acf260 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | dev_0xf260 | GPU: 15 units/17... 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | dev_0xf260 | Memory: 8116.43... 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | dev_0xf260 | Vendor: NVIDIA Corp... 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | dev_0xf260 | Model: GeForce GT... 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | Context properties: 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | Platform: 0xf190 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | platform_0xf190 | Profile: FULL_PROFILE 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | platform_0xf190 | Version: OpenCL 1... 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | platform_0xf190 | Name: NVIDIA CUDA 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | platform_0xf190 | Vendor: NVIDIA Corp... 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | platform_0xf190 | Extensions: cl_khr... 2019-06-03 15:42:47 ws_0x9c00 context_0x9f60 | Is user responsible for sync: Undefined (presumable No) \end{CodePar} } \item If an error occurred during the execution of the library function, there would be added to the log an entry starting with \verb|oclerr:| and containing information about all function calls from the library to the OpenCL API. So, the entry\nopagebreak \begin{CodeParWithCC}{\\\{\}} \textit{YYYY}-\textit{MM}-\textit{DD} \textit{hh}:\textit{mm}:\textit{ss} ws_0x\textit{HHHH} oclerr: _ghf_getBuildStatus/clGetProgramBuildInfo/CL_PROGRAM_BUILD_STATUS returned error -3 - CL_COMPILER_NOT_AVAILABLE \end{CodeParWithCC} means that the \verb|_ghf_getBuildStatus| function called the OpenCL API function \verb|clGetProgramBuildInfo| with the argument \verb|CL_PROGRAM_BUILD_STATUS| and received as a response the \verb|-3| error code, which stands for \verb|CL_COMPILER_NOT_AVAILABLE|. \end{enumerate} Given that OpenCL instance addresses are unique for one application run, it is highly likely that the combination of the name of the instance and the last two bytes of its address is also unique. Therefore, the use of these conventions allows, with substring filtering, obtain the necessary information from the log file for a particular OpenCL instance. In addition to the standard log entry there is also the header entry, which looks like\par {\small% \begin{CodeParWithCC}{\\\{\}} \textit{YYYY}-\textit{MM}-\textit{DD} \textit{hh}:\textit{mm}:\textit{ss} ws_0x\textit{HHHH} __________ \textit{YYYY}-\textit{MM}-\textit{DD} \textit{hh}:\textit{mm}:\textit{ss} ws_0x\textit{HHHH} \textit{Title_text} \textit{YYYY}-\textit{MM}-\textit{DD} \textit{hh}:\textit{mm}:\textit{ss} ws_0x\textit{HHHH} ~~~~~~~~~~ \end{CodeParWithCC} }\par \noindent and the delimiter entry, which looks like\par {\small% \begin{CodeParWithCC}{\\\{\}} \textit{YYYY}-\textit{MM}-\textit{DD} \textit{hh}:\textit{mm}:\textit{ss} ws_0x\textit{HHHH} ____________________________________________________ \textit{YYYY}-\textit{MM}-\textit{DD} \textit{hh}:\textit{mm}:\textit{ss} ws_0x\textit{HHHH} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \end{CodeParWithCC} } \input{name_conventions-english.tex} \section{OpenCL programs build tools} \label{sec:buildutils} The library includes three executable files:\nopagebreak\par \begin{itemize}[leftmargin=1.75\parindent] \item\verb|oclh_cr| -- compiles an OpenCL program into an OpenCL object; \item\verb|oclh_lr| -- links OpenCL objects; \item\verb|oclh_br| -- completly builds an OpenCL program. \end{itemize}\nopagebreak\par During the execution of these programs, a detailed diagnostic log is being maintained in the \verb|oclh_*r.log| file (according to the name of the tool), where excessive information is stored on all available GPGPU devices, used platforms, and contexts created for build. In fact, you can run, for example, \verb|oclh_сr| with any input file, even with itself as \verb|./oclh_сr oclh_сr|. The input file, of course, will not be built into an OpenCL object, but the \verb|oclh_сr.log| log file will contain complete information on GPGPU devices found in the system. The log file format is human-readable, adapted to search for substrings using the \verb|grep| command and analogues. The log file format is described~in~s.\ref{subsec:logformat}. Let us take a look at the use cases for each of these tools. \input{tools_compiler-english.tex} \input{tools_linker-english.tex} \input{tools_builder-english.tex} \section{Using the OpenCL\_helpers library. Structures, functions and headers} \label{sec:libraryusing} Stub. The section will be completed after sufficient testing of functionality. \subsection{Structures} \label{subsec:structures} Stub. The section will be completed after sufficient testing of functionality. \subsubsection{Main structure of the working configuration} \label{subsec:workset} Stub. The section will be completed after sufficient testing of functionality. \subsection{Logging functions} \label{subsec:logfunctions} Stub. The section will be completed after sufficient testing of functionality. \subsection{Common header files for CPU and GPGPU code} \label{subsec:sharedheaders} Stub. The section will be completed after sufficient testing of functionality. \section{Memory management and pointer reinterpretation in OpenCL C programs} \label{sec:memalloc} Stub. The section will be completed after sufficient testing of functionality. \section{Parallelism inside GPU} \label{sec:squadmodel} Stub. The section will be completed after sufficient testing of functionality. \let\originalstyle=\thispagestyle \def\thispagestyle#1{} \printindex \let\thispagestyle=\originalstyle \tableofcontents \end{document}