FetchContent/ExternalProject and URL_HASH

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

FetchContent/ExternalProject and URL_HASH

Dustyn Blasig
Hi All,

We are pulling some artifacts from Artifactory which provides a checksum file along with the artifacts at <artifact-url>.md5 or .sha256. If I do not include URL_HASH, does CMake automatically check to see if such a checksum file exists and use it's value for the hash check? Or is there a way to provide a URL for the checksum file rather than having to do file(DOWNLOAD <checksum>), file(STRING <checksum-file>), URL_HASH=<checksum-var>?

Thanks!

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: FetchContent/ExternalProject and URL_HASH

Craig Scott-3


On Wed, Jul 17, 2019 at 12:59 PM Dustyn Blasig <[hidden email]> wrote:
Hi All,

We are pulling some artifacts from Artifactory which provides a checksum file along with the artifacts at <artifact-url>.md5 or .sha256. If I do not include URL_HASH, does CMake automatically check to see if such a checksum file exists and use it's value for the hash check? Or is there a way to provide a URL for the checksum file rather than having to do file(DOWNLOAD <checksum>), file(STRING <checksum-file>), URL_HASH=<checksum-var>?

The point of the checksum file is to verify the file downloaded. It doesn't make a whole lot of sense to then download another file to provide that checksum, you'd just be moving the problem along one level of indirection. The assumption is when you provide the URL to be downloaded, if you want to use a checksum then you should also be able to provide that along with the URL. When the URL is being constructed on-the-fly though, this isn't typically true. In that case, you can't typically provide a checksum that isn't itself downloaded and therefore needs to be verified itself.

To more directly answer your question, CMake doesn't offer any feature to automatically download a checksum file (that I'm aware of). The file command expects that actual checksum, not a location for where to retrieve it from for the reasons mentioned above.
 
--
Craig Scott
Melbourne, Australia

Get the hand-book for every CMake user: Professional CMake: A Practical Guide

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: FetchContent/ExternalProject and URL_HASH

Dustyn Blasig
Thanks for the info, Craig. 

I'm not very familiar with the intricacies of network downloads. If the download itself guarantees the file was transferred correctly and the checksum is only be used to verify its authenticity, then we probably don't need it as we're only downloading these artifacts from trusted internal sources. However, if the checksums need to be used in some cases to verify the download was actually received and is in one correct piece then pulling the checksum file and basically failing if either is corrupted is fine for our use case. Can we assume the former, that CMake (and the underlying tools) will guarantee the file is downloaded successfully even in the event of a CTRL-C interruption or other signals?


On Sun, Jul 21, 2019 at 3:49 AM Craig Scott <[hidden email]> wrote:


On Wed, Jul 17, 2019 at 12:59 PM Dustyn Blasig <[hidden email]> wrote:
Hi All,

We are pulling some artifacts from Artifactory which provides a checksum file along with the artifacts at <artifact-url>.md5 or .sha256. If I do not include URL_HASH, does CMake automatically check to see if such a checksum file exists and use it's value for the hash check? Or is there a way to provide a URL for the checksum file rather than having to do file(DOWNLOAD <checksum>), file(STRING <checksum-file>), URL_HASH=<checksum-var>?

The point of the checksum file is to verify the file downloaded. It doesn't make a whole lot of sense to then download another file to provide that checksum, you'd just be moving the problem along one level of indirection. The assumption is when you provide the URL to be downloaded, if you want to use a checksum then you should also be able to provide that along with the URL. When the URL is being constructed on-the-fly though, this isn't typically true. In that case, you can't typically provide a checksum that isn't itself downloaded and therefore needs to be verified itself.

To more directly answer your question, CMake doesn't offer any feature to automatically download a checksum file (that I'm aware of). The file command expects that actual checksum, not a location for where to retrieve it from for the reasons mentioned above.
 
--
Craig Scott
Melbourne, Australia

Get the hand-book for every CMake user: Professional CMake: A Practical Guide

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: FetchContent/ExternalProject and URL_HASH

Craig Scott-3


On Mon, Jul 22, 2019 at 10:37 AM Dustyn Blasig <[hidden email]> wrote:
Thanks for the info, Craig. 

I'm not very familiar with the intricacies of network downloads. If the download itself guarantees the file was transferred correctly and the checksum is only be used to verify its authenticity, then we probably don't need it as we're only downloading these artifacts from trusted internal sources. However, if the checksums need to be used in some cases to verify the download was actually received and is in one correct piece then pulling the checksum file and basically failing if either is corrupted is fine for our use case. Can we assume the former, that CMake (and the underlying tools) will guarantee the file is downloaded successfully even in the event of a CTRL-C interruption or other signals?

I don't think it is realistic to expect CMake or the underlying tools to still give you a successful file download if you interrupt it. ;)

One of the things the file(DOWNLOAD) command uses the checksum for is to check if it can avoid having to download the file already exists. Without the checksum, if there is already a file at the destination, CMake can't tell if it is the right file and will download it again each time. Something I hadn't considered in my previous reply was if the file to be downloaded is very big, then it may still end up being more efficient to download the separate checksum file each time and read the big file's checksum from it to avoid re-downloading the big file if you've already got it from a previous run.

Another reason you might want to explicitly specify the checksum rather than download a checksum file is that after you've done a configure once, you won't need to do any network communication to get it for subsequent runs because the checksum can be used to confirm you already have the right file. This allows you to run configure while connected to the network, then disconnect and work offline thereafter (handy if you're on a laptop and travelling!).

Another use for the checksum file is to ensure you are receiving the file you expect from the source. This can help catch things like man-in-the-middle attacks or other malicious acts where the download is intercepted and some other file substituted. If you have a trustworthy connection to the source, this is less likely to be a concern for you, but I'll leave that to your own judgement.

 


On Sun, Jul 21, 2019 at 3:49 AM Craig Scott <[hidden email]> wrote:


On Wed, Jul 17, 2019 at 12:59 PM Dustyn Blasig <[hidden email]> wrote:
Hi All,

We are pulling some artifacts from Artifactory which provides a checksum file along with the artifacts at <artifact-url>.md5 or .sha256. If I do not include URL_HASH, does CMake automatically check to see if such a checksum file exists and use it's value for the hash check? Or is there a way to provide a URL for the checksum file rather than having to do file(DOWNLOAD <checksum>), file(STRING <checksum-file>), URL_HASH=<checksum-var>?

The point of the checksum file is to verify the file downloaded. It doesn't make a whole lot of sense to then download another file to provide that checksum, you'd just be moving the problem along one level of indirection. The assumption is when you provide the URL to be downloaded, if you want to use a checksum then you should also be able to provide that along with the URL. When the URL is being constructed on-the-fly though, this isn't typically true. In that case, you can't typically provide a checksum that isn't itself downloaded and therefore needs to be verified itself.

To more directly answer your question, CMake doesn't offer any feature to automatically download a checksum file (that I'm aware of). The file command expects that actual checksum, not a location for where to retrieve it from for the reasons mentioned above.
 
--
Craig Scott
Melbourne, Australia

Get the hand-book for every CMake user: Professional CMake: A Practical Guide

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: FetchContent/ExternalProject and URL_HASH

Dustyn Blasig
"I don't think it is realistic to expect CMake or the underlying tools to still give you a successful file download if you interrupt it. ;)"

Sorry, I didn't word that well : )

I was curious if CMake uses handlers behind the scenes similar to GNU Make such that if a signal occurs it will cleanup any partially-written files so the next time you build it will retry those targets. In this case, if the download is interrupted, will CMake know it needs to redo the download? Are the checksum files used to verify the download was successful, or really only useful for authenticity like the man-in-the-middle attacks?

I definitely like the download and configure once and reuse the download offline part, that is useful!

On Mon, Jul 22, 2019 at 5:58 AM Craig Scott <[hidden email]> wrote:


On Mon, Jul 22, 2019 at 10:37 AM Dustyn Blasig <[hidden email]> wrote:
Thanks for the info, Craig. 

I'm not very familiar with the intricacies of network downloads. If the download itself guarantees the file was transferred correctly and the checksum is only be used to verify its authenticity, then we probably don't need it as we're only downloading these artifacts from trusted internal sources. However, if the checksums need to be used in some cases to verify the download was actually received and is in one correct piece then pulling the checksum file and basically failing if either is corrupted is fine for our use case. Can we assume the former, that CMake (and the underlying tools) will guarantee the file is downloaded successfully even in the event of a CTRL-C interruption or other signals?

I don't think it is realistic to expect CMake or the underlying tools to still give you a successful file download if you interrupt it. ;)

One of the things the file(DOWNLOAD) command uses the checksum for is to check if it can avoid having to download the file already exists. Without the checksum, if there is already a file at the destination, CMake can't tell if it is the right file and will download it again each time. Something I hadn't considered in my previous reply was if the file to be downloaded is very big, then it may still end up being more efficient to download the separate checksum file each time and read the big file's checksum from it to avoid re-downloading the big file if you've already got it from a previous run.

Another reason you might want to explicitly specify the checksum rather than download a checksum file is that after you've done a configure once, you won't need to do any network communication to get it for subsequent runs because the checksum can be used to confirm you already have the right file. This allows you to run configure while connected to the network, then disconnect and work offline thereafter (handy if you're on a laptop and travelling!).

Another use for the checksum file is to ensure you are receiving the file you expect from the source. This can help catch things like man-in-the-middle attacks or other malicious acts where the download is intercepted and some other file substituted. If you have a trustworthy connection to the source, this is less likely to be a concern for you, but I'll leave that to your own judgement.

 


On Sun, Jul 21, 2019 at 3:49 AM Craig Scott <[hidden email]> wrote:


On Wed, Jul 17, 2019 at 12:59 PM Dustyn Blasig <[hidden email]> wrote:
Hi All,

We are pulling some artifacts from Artifactory which provides a checksum file along with the artifacts at <artifact-url>.md5 or .sha256. If I do not include URL_HASH, does CMake automatically check to see if such a checksum file exists and use it's value for the hash check? Or is there a way to provide a URL for the checksum file rather than having to do file(DOWNLOAD <checksum>), file(STRING <checksum-file>), URL_HASH=<checksum-var>?

The point of the checksum file is to verify the file downloaded. It doesn't make a whole lot of sense to then download another file to provide that checksum, you'd just be moving the problem along one level of indirection. The assumption is when you provide the URL to be downloaded, if you want to use a checksum then you should also be able to provide that along with the URL. When the URL is being constructed on-the-fly though, this isn't typically true. In that case, you can't typically provide a checksum that isn't itself downloaded and therefore needs to be verified itself.

To more directly answer your question, CMake doesn't offer any feature to automatically download a checksum file (that I'm aware of). The file command expects that actual checksum, not a location for where to retrieve it from for the reasons mentioned above.
 
--
Craig Scott
Melbourne, Australia

Get the hand-book for every CMake user: Professional CMake: A Practical Guide

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake
Reply | Threaded
Open this post in threaded view
|

Re: FetchContent/ExternalProject and URL_HASH

fdk17


On Mon, Jul 22, 2019, at 11:51 AM, Dustyn Blasig wrote:
"I don't think it is realistic to expect CMake or the underlying tools to still give you a successful file download if you interrupt it. ;)"

Sorry, I didn't word that well : )

I was curious if CMake uses handlers behind the scenes similar to GNU Make such that if a signal occurs it will cleanup any partially-written files so the next time you build it will retry those targets. In this case, if the download is interrupted, will CMake know it needs to redo the download? Are the checksum files used to verify the download was successful, or really only useful for authenticity like the man-in-the-middle attacks?

I definitely like the download and configure once and reuse the download offline part, that is useful!

I recall FetchContent being smart enough to determine that it was interrupted by user. The time stamps typically don't get updated until after successfully completing the fetch and configure. So it’ll just redo the pieces that needs to be updated or didn’t run in the first place. 

The checksums are used to verify the download was successful. Typical Ethernet and TCP/IP is not robust enough to detect single bit errors in large downloads. It’s one of the reasons why md5sum accompanies ISO downloads. Less of an issue with small items but still can happen. 

One reason for not downloading checksum file is that you don’t know if package had an issue or if the checksum file had an issue. You wouldn’t want to download a large file because the checksum file had an error. 

I can see how having this feature would be beneficial to you but you may just be forced to download checksum independently and parse it first. Then use contents for the real item to be downloaded. 


F

--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
https://cmake.org/mailman/listinfo/cmake