ExternalProject, continuous integration and caching builds

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

ExternalProject, continuous integration and caching builds

Antoine Pitrou

Hello,

On our project (Apache Arrow - https://arrow.apache.org/) we're using
CMake for the C++ source tree and have many external dependencies
fetched using ExternalProject.  In turn building those dependencies can
make up a significant portion of build times on CI services, especially
AppVeyor.  So I've been looking for a solution to cache those
third-party builds from one CI run to the other.

Right now, what I'm trying to do is to set EP_BASE to a well-known base
directory and ask AppVeyor to cache and restore that directory in each
new build.  The AppVeyor caching seems to work fine (the EP_BASE
directory is saved and restored).  However, it seems that nevertheless
CMake will rebuild all those projects again, despite the cached build
results.

This is with CMake 3.12.1 on Windows.

Here is the log for an example build step, here the zstd library:
https://ci.appveyor.com/project/pitrou/arrow/build/1.0.700/job/i4tj6tifp4xq1mjn?fullLog=true#L803

As you can see, CMake notices the downloaded tarball is up-to-date and
doesn't download it again, but it still extracts it again (why?) and
builds the source code anew.  Yet the entire EP_BASE directory (here
"C:/Users/appveyor/arrow-externals") is cached and restored by AppVeyor.

Did someone manage to make this work, and/or is there another solution?

Thank you

Regards

Antoine.
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: ExternalProject, continuous integration and caching builds

Craig Scott-3


On Wed, Sep 5, 2018 at 9:56 PM, Antoine Pitrou <[hidden email]> wrote:

Hello,

On our project (Apache Arrow - https://arrow.apache.org/) we're using
CMake for the C++ source tree and have many external dependencies
fetched using ExternalProject.  In turn building those dependencies can
make up a significant portion of build times on CI services, especially
AppVeyor.  So I've been looking for a solution to cache those
third-party builds from one CI run to the other.

Right now, what I'm trying to do is to set EP_BASE to a well-known base
directory and ask AppVeyor to cache and restore that directory in each
new build.  The AppVeyor caching seems to work fine (the EP_BASE
directory is saved and restored).  However, it seems that nevertheless
CMake will rebuild all those projects again, despite the cached build
results.

When AppVeyor restores the cached directories and files, does it also preserve their timestamps? If not, that might explain why it always rebuilds.

 

This is with CMake 3.12.1 on Windows.

Here is the log for an example build step, here the zstd library:
https://ci.appveyor.com/project/pitrou/arrow/build/1.0.700/job/i4tj6tifp4xq1mjn?fullLog=true#L803

As you can see, CMake notices the downloaded tarball is up-to-date and
doesn't download it again, but it still extracts it again (why?) and
builds the source code anew.  Yet the entire EP_BASE directory (here
"C:/Users/appveyor/arrow-externals") is cached and restored by AppVeyor.

Did someone manage to make this work, and/or is there another solution?



--
Craig Scott
Melbourne, Australia


--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: ExternalProject, continuous integration and caching builds

Antoine Pitrou

Le 05/09/2018 à 14:28, Craig Scott a écrit :

>
>
> On Wed, Sep 5, 2018 at 9:56 PM, Antoine Pitrou <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     Hello,
>
>     On our project (Apache Arrow - https://arrow.apache.org/) we're using
>     CMake for the C++ source tree and have many external dependencies
>     fetched using ExternalProject.  In turn building those dependencies can
>     make up a significant portion of build times on CI services, especially
>     AppVeyor.  So I've been looking for a solution to cache those
>     third-party builds from one CI run to the other.
>
>     Right now, what I'm trying to do is to set EP_BASE to a well-known base
>     directory and ask AppVeyor to cache and restore that directory in each
>     new build.  The AppVeyor caching seems to work fine (the EP_BASE
>     directory is saved and restored).  However, it seems that nevertheless
>     CMake will rebuild all those projects again, despite the cached build
>     results.
>
>
> When AppVeyor restores the cached directories and files, does it also
> preserve their timestamps? If not, that might explain why it always
> rebuilds.

I do not know.  I've found out this utility:
https://github.com/iboB/mtime_cache and will experiment with it.

Regards

Antoine.
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: ExternalProject, continuous integration and caching builds

Antoine Pitrou

Le 05/09/2018 à 14:37, Antoine Pitrou a écrit :
>
>> When AppVeyor restores the cached directories and files, does it also
>> preserve their timestamps? If not, that might explain why it always
>> rebuilds.
>
> I do not know.  I've found out this utility:
> https://github.com/iboB/mtime_cache and will experiment with it.

It turns out that doesn't solve the issue.  One likely explanation is
that fixing the timestamps on cached contents is not useful if
ExternalProject unpacks the downloaded tarball again and overwrites the
source files.

One thing is that we build many of those dependencies in-source (using
BUILD_IN_SOURCE), since they don't necessarily support out-of-tree builds...

Regards

Antoine.
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: ExternalProject, continuous integration and caching builds

Innokentiy Alaytsev
Hello!

You may try to perform some "caching" actions yourself: store some kind of a
checksum for the archive with the downloaded sources and only build
ExternalProject if the checksum changes or if there is no previously built
ExternalProject with the same checksum. It is also possible to only store
checksum for the archive link if it is guaranteed to be different for
different dependency versions.

Here
<https://gitlab.com/UtilityToolKit/utk.cmake/blob/master/utk_cmake_package.cmake#L76>  
is an ugly implementation of a similar logic for downloaded project.

Best regards,
Innokentiy



--
Sent from: http://cmake.3232098.n2.nabble.com/
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: ExternalProject, continuous integration and caching builds

Innokentiy Alaytsev
In reply to this post by Antoine Pitrou
Hello!

You may try to perform some kind of "caching" actions yourself: store a
checksum for the downloaded dependency archive and only build it if its
checksum changes or the dependency with this checksum hasn't been already
built. You may only store the checksum for the archive download link if it
is possible to guarantee that the link is different for different versions
of the dependency.

Here
<https://gitlab.com/UtilityToolKit/utk.cmake/blob/master/utk_cmake_package.cmake#L76>  
you may find an ugly implementation of a similar logic.

Best regards,
Innokentiy



--
Sent from: http://cmake.3232098.n2.nabble.com/
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: ExternalProject, continuous integration and caching builds

Isaiah Norton
In reply to this post by Antoine Pitrou
CMake notices the downloaded tarball is up-to-date and
doesn't download it again, but it still extracts it again

From what I can tell, the 'check|download tarball' and 'extract tarball' commands are independent parts of the "download step": as long as the download step runs at all, it will re-extract the tarball, even if it skipped re-downloading. So the issue (design questions aside) is what triggers the download step. The minimal dependency for that step looks like "LIBNAME-gitinfo.txt" file somewhere in the stamp directory, so you could check the mtime preservation there. I don't know if there are other dependencies added for the download target added separately.

On Wed, Sep 5, 2018 at 8:06 AM Antoine Pitrou <[hidden email]> wrote:

Hello,

On our project (Apache Arrow - https://arrow.apache.org/) we're using
CMake for the C++ source tree and have many external dependencies
fetched using ExternalProject.  In turn building those dependencies can
make up a significant portion of build times on CI services, especially
AppVeyor.  So I've been looking for a solution to cache those
third-party builds from one CI run to the other.

Right now, what I'm trying to do is to set EP_BASE to a well-known base
directory and ask AppVeyor to cache and restore that directory in each
new build.  The AppVeyor caching seems to work fine (the EP_BASE
directory is saved and restored).  However, it seems that nevertheless
CMake will rebuild all those projects again, despite the cached build
results.

This is with CMake 3.12.1 on Windows.

Here is the log for an example build step, here the zstd library:
https://ci.appveyor.com/project/pitrou/arrow/build/1.0.700/job/i4tj6tifp4xq1mjn?fullLog=true#L803

As you can see, CMake notices the downloaded tarball is up-to-date and
doesn't download it again, but it still extracts it again (why?) and
builds the source code anew.  Yet the entire EP_BASE directory (here
"C:/Users/appveyor/arrow-externals") is cached and restored by AppVeyor.

Did someone manage to make this work, and/or is there another solution?

Thank you

Regards

Antoine.
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Fwd: ExternalProject, continuous integration and caching builds

Chris Wilson
Hi all,

I faced exactly the same issue with Box Backup, and "solved" it by caching completely built packages in tarballs, replacing the ExternalProject with a different one that uses the cached package if it hasn't been invalidated (which we determine based on the CMakeLists.txt having changed).

You can see the superbuild file here:

We define a function that extracts a cached package:
function(ExternalProject_Use_Cache project_name package_file install_path)
	message(STATUS "Will use cached package file: ${package_file}")

	ExternalProject_Add(${project_name}
		DOWNLOAD_COMMAND ${CMAKE_COMMAND} -E echo
			"No download step needed (using cached package)"
		CONFIGURE_COMMAND ${CMAKE_COMMAND} -E echo
			"No configure step needed (using cached package)"
		BUILD_COMMAND ${CMAKE_COMMAND} -E echo
			"No build step needed (using cached package)"
		INSTALL_COMMAND ${CMAKE_COMMAND} -E echo
			"No install step needed (using cached package)"
	)

	# We want our tar files to contain the Install/<package> prefix (not for any
	# very special reason, only for consistency and so that we can identify them
	# in the extraction logs) which means that we must extract them in the
	# binary (top-level build) directory to have them installed in the right
	# place for subsequent ExternalProjects to pick them up. It seems that the
	# only way to control the working directory is with Add_Step!
	ExternalProject_Add_Step(${project_name} extract
		ALWAYS 1
		COMMAND
			${CMAKE_COMMAND} -E echo
			"Extracting ${package_file} to ${install_path}"
		COMMAND
			${CMAKE_COMMAND} -E tar xzvf ${package_file}
			${install_path}
			WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
	)

	ExternalProject_Add_StepTargets(${project_name} extract)
endfunction()
And a function to create a new cached package:
function(ExternalProject_Create_Cache project_name package_file install_path)
	if(NOT EXISTS ${package_file})
		# Note: because this is evaluated when CMake is run, not when make is run, you will
		# need to rerun CMake to stop it repackaging your already-packaged sources. This
		# works well enough for the main use case, which is speeding up AppVeyor builds,
		# since the cache from a previous run is always restored before CMake is run, so if
		# we already have a package then this code will never be run.

		message(STATUS "Will create cached package file: ${package_file}")

		ExternalProject_Add_Step(${project_name} package
			DEPENDEES install
			BYPRODUCTS ${package_file}
			COMMAND ${CMAKE_COMMAND} -E echo "Updating cached package file: ${package_file}"
			COMMAND ${CMAKE_COMMAND} -E tar czvf ${package_file}
				${install_path}
		)

		ExternalProject_Add_StepTargets(${project_name} package)
	endif()
endfunction()
And call them like this:
file(MD5 ${CMAKE_CURRENT_LIST_FILE} cmake_lists_hash)
set(zlib_install_dir "${install_dir}/zlib") set(zlib_package_file "${cache_dir}/zlib_${cmake_lists_hash}.tgz")
if(EXISTS ${zlib_package_file}) ExternalProject_Use_Cache(zlib ${zlib_package_file} ${zlib_install_dir}) else() ExternalProject_Add(...) ExternalProject_Create_Cache(zlib ${zlib_package_file} ${zlib_install_dir}) endif()
These package files are cached and restored by our AppVeyor configuration, and also the source tarballs used to rebuild them, but not the CMake build directory.

This cuts about 10 minutes off our normal build time for each configuration on AppVeyor (80 minutes total per commit tested, with 8 different configurations). But it does make local superbuilds a bit more tricky, as described in the comments: we often have to rerun CMake to use previously built cached packages instead of rebuilding them.

I hope this is interesting/helpful to others too.

Thanks, Chris.

On 6 September 2018 at 05:30, Isaiah Norton <[hidden email]> wrote:
CMake notices the downloaded tarball is up-to-date and
doesn't download it again, but it still extracts it again

From what I can tell, the 'check|download tarball' and 'extract tarball' commands are independent parts of the "download step": as long as the download step runs at all, it will re-extract the tarball, even if it skipped re-downloading. So the issue (design questions aside) is what triggers the download step. The minimal dependency for that step looks like "LIBNAME-gitinfo.txt" file somewhere in the stamp directory, so you could check the mtime preservation there. I don't know if there are other dependencies added for the download target added separately.

On Wed, Sep 5, 2018 at 8:06 AM Antoine Pitrou <[hidden email]> wrote:

Hello,

On our project (Apache Arrow - https://arrow.apache.org/) we're using
CMake for the C++ source tree and have many external dependencies
fetched using ExternalProject.  In turn building those dependencies can
make up a significant portion of build times on CI services, especially
AppVeyor.  So I've been looking for a solution to cache those
third-party builds from one CI run to the other.

Right now, what I'm trying to do is to set EP_BASE to a well-known base
directory and ask AppVeyor to cache and restore that directory in each
new build.  The AppVeyor caching seems to work fine (the EP_BASE
directory is saved and restored).  However, it seems that nevertheless
CMake will rebuild all those projects again, despite the cached build
results.

This is with CMake 3.12.1 on Windows.

Here is the log for an example build step, here the zstd library:
https://ci.appveyor.com/project/pitrou/arrow/build/1.0.700/job/i4tj6tifp4xq1mjn?fullLog=true#L803

As you can see, CMake notices the downloaded tarball is up-to-date and
doesn't download it again, but it still extracts it again (why?) and
builds the source code anew.  Yet the entire EP_BASE directory (here
"C:/Users/appveyor/arrow-externals") is cached and restored by AppVeyor.

Did someone manage to make this work, and/or is there another solution?

Thank you

Regards

Antoine.
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake




--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake