groups of ascii nul ('\0) characters inserted into make output but this just occurs for parallel builds

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

groups of ascii nul ('\0) characters inserted into make output but this just occurs for parallel builds

Alan W. Irwin-2
My parallel builds on my Linux OS (currently Debian Buster, but this
also happened for Debian Jessie so this is a long-standing problem)
have ascii null ('\0) characters inserted in the output while the
corresponding non-parallel build does not have those extra characters.
For example, for the PLplot case and proceeding from a clean start
by configuring PLplot in an initially empty build tree:

software@merlin> make -j16 test_noninteractive >& test_noninteractive.out
software@merlin> od -c test_noninteractive.out |grep '\\0'
0713660   g   r   a   d   i   e   n   t   .  \n  \n  \0  \0  \0  \0  \0
0713700  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0717000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0717540  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   O   u   t
0722160   1   4   f   .   f   9   0   .   o  \n  \0  \0  \0  \0  \0  \0
0722200  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0722740  \0  \0  \0  \0  \0  \0  \0   O   u   t   p   u   t       f   i

Note the test_noninteractive target has been configured by the PLplot
build system to build all non-interactive test command prerequisties
and then runs those test commands.  (Note, these test commands are
just executables that I build and are not formal CMake test commands
that, for example, are run by ctest.)  Also note those '\0' characters
occur in groups, and there are three such groups in the output, but
because parallel builds mix target outputs together it is impossible
to tell which target is generating the '\0' characters.  And if I drop
the parallel build (-j16) option, the issue goes away.

Note, there is a chance this is an operating system problem rather
than a CMake problem.  For example, my shell (bash) might not properly
handle the case where many different applications are simultaneously
outputting to stdout.  But if so, it is a difficult issue to
replicate.  For example, I implemented a simple Makefile where the all
target had 10 different prerequisites, and each of those prerequisites
used the the cat command to output the UTF-8 contents of the same file
where the file was long enough (in this case ~10000 characters) so the
different outputs were mixed together for the parallel build case.
But that test failed to replicate the issue because the parallel and
non-parallel builds produced output with the same number of characters
(which was 10 times the number of characters in the file), and further
evaluation using "od -c" showed no instances of "\0" in either the
parallel or non-parallel output.

So two questions for those who are lurking here:

1. Have you ever experienced this problem when building a target with
large numbers of prerequisites in parallel with CMake, and if so, what
was the fix?

2. It is possible (although I cannot imagine how the symptoms would
just be limited to emission of extra '\0' characters) this issue could
be the symptom of a parallel build race condition for the PLplot build
system.  To help eliminate or confirm this possibility what is the most
fool-proof method to diagnose whether such a race exists?

Alan
__________________________
Alan W. Irwin

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: groups of ascii nul ('\0) characters inserted into make output but this just occurs for parallel builds

Nils Gladitz-2


On Sun, Jul 8, 2018 at 11:00 PM Alan W. Irwin <[hidden email]> wrote:
My parallel builds on my Linux OS (currently Debian Buster, but this
also happened for Debian Jessie so this is a long-standing problem)
have ascii null ('\0) characters inserted in the output while the
corresponding non-parallel build does not have those extra characters.

I narrowed it down with Ninja which buffers command outputs in parallel builds which makes it easier to match output to specific commands.
Alternatively I think CTest launchers might have helped with this too. CMake uses them to redirect build command outputs to distinct files for CTest submissions.

I see null bytes coming from the "cat test.error" in the script file generated from https://sourceforge.net/p/plplot/plplot/ci/master/tree/plplot_test/test_c.sh.in

If this only shows up in parallel builds perhaps multiple processes are reading/writing the same test.error file in parallel?
Haven't looked closer than that. You probably know better where to look from there.

Nils

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: groups of ascii nul ('\0) characters inserted into make output but this just occurs for parallel builds (SOLVED)

Alan W. Irwin-2
On 2018-07-09 09:10+0200 Nils Gladitz wrote:

> On Sun, Jul 8, 2018 at 11:00 PM Alan W. Irwin <[hidden email]>
> wrote:
>
>> My parallel builds on my Linux OS (currently Debian Buster, but this
>> also happened for Debian Jessie so this is a long-standing problem)
>> have ascii null ('\0) characters inserted in the output while the
>> corresponding non-parallel build does not have those extra characters.
>>
>
> I narrowed it down with Ninja which buffers command outputs in parallel
> builds which makes it easier to match output to specific commands.
> Alternatively I think CTest launchers might have helped with this too.
> CMake uses them to redirect build command outputs to distinct files for
> CTest submissions.
>
> I see null bytes coming from the "cat test.error" in the script file
> generated from
> https://sourceforge.net/p/plplot/plplot/ci/master/tree/plplot_test/test_c.sh.in
>
> If this only shows up in parallel builds perhaps multiple processes are
> reading/writing the same test.error file in parallel?
> Haven't looked closer than that. You probably know better where to look
> from there.

Hi Nils:

Many thanks for going "above and beyond" with this issue.  Indeed,
many of the language test scripts in plplot_test write to "test.error"
and then output it after the command has been run with "cat
test.error".  Many of those test.error files are in the same directory
so this constitutes a many-way name clash and therefore a set of clear race
condition for parallel builds that has existed for at least the last
decade in PLplot.  UGH!

I fixed those nameclashes with PLplot commit a4bada004 (see that log
message for additional commentary), and the extra '\0' characters in
parallel output are now gone.  I have no idea why the symptom of the
race is the extra '\0' characters, but I am just happy to have these
symptoms as well as the race conditions fixed!

Thanks for your key help in leading me to the solution of
this long-standing PLplot build-system issue.

Alan
__________________________
Alan W. Irwin

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake