parkr
changed the title from
Proposed fix for issue #5192
to
Fix issue where Windows drive name is stripped from Jekyll.sanitized_pathAug 22, 2016

parkr
changed the title from
Fix issue where Windows drive name is stripped from Jekyll.sanitized_path
to
Fix issue where Windows drive name is stripped from Jekyll.sanitized_path incorrectlyAug 22, 2016

This comment has been minimized.

Is there a test/scenario that shows where this would be useful/can we just drop the whole clean_path.sub!(%r!\A\w:/!, "/") bit? I'm trying to think of a legit case where we receive questionable_path with a leading drive letter AND clean_path.start_with?(base_directory) returns false AND where File.join(base_directory, clean_path.sub!(%r!\A\w:/!, "/")) is expected to do something sensible. It seems like, if clean_path contains a drive letter, it is a fully absolute path by definition. So, doing this substitution and then appending it on to base_directory would always result in an invalid path.

edited

Edited 1 time

ptoomey3 edited Aug 23, 2016 (most recent)

Is there a test/scenario that shows where this would be useful/can we just drop the whole clean_path.sub!(%r!\A\w:/!, "/") bit? I'm trying to think of a legit case where we receive questionable_path with a leading drive letter AND clean_path.start_with?(base_directory) returns false AND where File.join(base_directory, clean_path.sub!(%r!\A\w:/!, "/")) is expected to do something sensible. It seems like, if clean_path contains a drive letter, it is a fully absolute path by definition. So, doing this substitution and then appending it on to base_directory would always result in an invalid path.

This comment has been minimized.

@ptoomey3 it'll be a valid path in the sense that it'll work for home users, but it'll break every one of my installs of Jekyll. Our Jekyll users here have 4 drives, plus another 8 on the network depending on who they are. If they are working from their secondary drive (which is where we expect them to place their personal data as that is their personal drive to keep at no matter what) and this strips the path, Jekyll now thinks C:/ is the path, it is not, Jekyll just broke.

This fixes a problem while exacerbating a larger problem for other users and I can see potential security issues downstream (I'd have to test this because I'm not quite sure but I have a feeling this would affect Jekyll Assets which relies on this to ensure nothing silly is happening in a round-about-way and to keep Sprockets in check, along with the users.)

@ptoomey3 it'll be a valid path in the sense that it'll work for home users, but it'll break every one of my installs of Jekyll. Our Jekyll users here have 4 drives, plus another 8 on the network depending on who they are. If they are working from their secondary drive (which is where we expect them to place their personal data as that is their personal drive to keep at no matter what) and this strips the path, Jekyll now thinks C:/ is the path, it is not, Jekyll just broke.

This fixes a problem while exacerbating a larger problem for other users and I can see potential security issues downstream (I'd have to test this because I'm not quite sure but I have a feeling this would affect Jekyll Assets which relies on this to ensure nothing silly is happening in a round-about-way and to keep Sprockets in check, along with the users.)

This comment has been minimized.

I'm still unclear on when this test's behavior is desirable. It seems like the path to sitemap.xml is absolute, so under what condition does it make sense to transform it into a relative path that is appended on to the base path? I understand that this is largely just testing existing behavior, but I'm wondering if that behavior is needed.

I'm still unclear on when this test's behavior is desirable. It seems like the path to sitemap.xml is absolute, so under what condition does it make sense to transform it into a relative path that is appended on to the base path? I understand that this is largely just testing existing behavior, but I'm wondering if that behavior is needed.

This comment has been minimized.

The scenario is if we configured source to D:/ and destination to D:/site, the file sitemap.xml will be created in the source path before copying to destination (i.e copy D:/sitemap.xml to D:/site/sitemap.xml). That is where Jekyll build fail in my test bed.

The scenario is if we configured source to D:/ and destination to D:/site, the file sitemap.xml will be created in the source path before copying to destination (i.e copy D:/sitemap.xml to D:/site/sitemap.xml). That is where Jekyll build fail in my test bed.

This comment has been minimized.

I shudder to think of how many plugins and such we'd break if we changed Jekyll.sanitized_path to not just gracefully bolt the base_directory before the questionable_path, so I think we need to keep that behaviour.

I agree that the code at present is hella difficult to keep track of and reason about. A lot of that confusion comes down to the fact that the mysterious regexp and so on is not described through comments. For this critical piece of the codebase, maybe a line-by-line explanation would be helpful.

I shudder to think of how many plugins and such we'd break if we changed Jekyll.sanitized_path to not just gracefully bolt the base_directory before the questionable_path, so I think we need to keep that behaviour.

I agree that the code at present is hella difficult to keep track of and reason about. A lot of that confusion comes down to the fact that the mysterious regexp and so on is not described through comments. For this critical piece of the codebase, maybe a line-by-line explanation would be helpful.

This comment has been minimized.

@parkr I think the fix for #5276 should obsolete the regex that strip away drive name. I'm not sure if there is a valid reason to strip away any drive name. I think it is also good to remove the regex line that strip drive name, as it really doesn't make any sense.

Run a couple test locally in Windows machine and failed miserably. After all, removing the regex isn't really a good idea for Windows user.

edited

Edited 1 time

kwokfu edited Aug 29, 2016 (most recent)

@parkr I think the fix for #5276 should obsolete the regex that strip away drive name. I'm not sure if there is a valid reason to strip away any drive name. I think it is also good to remove the regex line that strip drive name, as it really doesn't make any sense.

Run a couple test locally in Windows machine and failed miserably. After all, removing the regex isn't really a good idea for Windows user.

This comment has been minimized.

The reason behind that 2 fail cases is because of how the File.expand_path behaves differently between Windows (gives "D:/sitemap.xml") and *nix (gives "/D:/sitemap.xml").

I do not want to add too much decision in Jekyll.sanitized_path, which might make it harder to read. I am thinking if there is a better way to determine if it is running in Windows environment, so that the same tests can be skipped in *nix environment.

edited

Edited 1 time

kwokfu edited Sep 23, 2016 (most recent)

The reason behind that 2 fail cases is because of how the File.expand_path behaves differently between Windows (gives "D:/sitemap.xml") and *nix (gives "/D:/sitemap.xml").

I do not want to add too much decision in Jekyll.sanitized_path, which might make it harder to read. I am thinking if there is a better way to determine if it is running in Windows environment, so that the same tests can be skipped in *nix environment.

This comment has been minimized.

@ashmaroli, thats because both the tests were written specifically with Windows environment in mind. Or maybe I should just assert differently between Windows and *nix (i.e. D:/site/sitemap.xml for Windows and D:/site/D:/sitemap.xml for *nix). What you say?

@ashmaroli, thats because both the tests were written specifically with Windows environment in mind. Or maybe I should just assert differently between Windows and *nix (i.e. D:/site/sitemap.xml for Windows and D:/site/D:/sitemap.xml for *nix). What you say?

This comment has been minimized.

@parkr, I'm still exploring this and thinking what if someone really did create a directory in *nix with a name that looks like Windows drive name (for whatever reason) and what will be the case that goes into the sanitized_path.

@parkr, I'm still exploring this and thinking what if someone really did create a directory in *nix with a name that looks like Windows drive name (for whatever reason) and what will be the case that goes into the sanitized_path.

This comment has been minimized.

@parkr, @ashmaroli, I believe current changes made is sufficient to handle both the reported bugs without altering the original design of the function. I think introducing the suggested logic for Windows-only will introduce duplicate codes or changing the designate behaviour of the function, hence, not really wanted to incorporate in this patch (may be in another patch with discussion on how this function should behave?).

@parkr, @ashmaroli, I believe current changes made is sufficient to handle both the reported bugs without altering the original design of the function. I think introducing the suggested logic for Windows-only will introduce duplicate codes or changing the designate behaviour of the function, hence, not really wanted to incorporate in this patch (may be in another patch with discussion on how this function should behave?).

This comment has been minimized.

I think introducing the suggested logic for Windows-only will introduce duplicate codes or changing the designate behaviour of the function, hence, not really wanted to incorporate in this patch (may be in another patch with discussion on how this function should behave?).

@kwokfu My primary concern is ensuring that all possible path traversal vectors are covered. This function exists to prevent path traversal. My secondary concern is doing The Right Thing with regard to drive names. What if you request C:/mysite and your site source is at D:/Users/My Documents/mysite? What is the expected behaviour there? Being able to tailor our path traversal checking to also respect drive names (instead of ignoring them as this PR does) is worth the potential code duplication.

@ptoomey3 If you have time to load this up and try to crack it, I'd love to have your approval before I merge. FWIW, it passes our checks in this test suite. Thanks!

I think introducing the suggested logic for Windows-only will introduce duplicate codes or changing the designate behaviour of the function, hence, not really wanted to incorporate in this patch (may be in another patch with discussion on how this function should behave?).

@kwokfu My primary concern is ensuring that all possible path traversal vectors are covered. This function exists to prevent path traversal. My secondary concern is doing The Right Thing with regard to drive names. What if you request C:/mysite and your site source is at D:/Users/My Documents/mysite? What is the expected behaviour there? Being able to tailor our path traversal checking to also respect drive names (instead of ignoring them as this PR does) is worth the potential code duplication.

@ptoomey3 If you have time to load this up and try to crack it, I'd love to have your approval before I merge. FWIW, it passes our checks in this test suite. Thanks!

This comment has been minimized.

I would also like to know is there any documentation or discussion restricting where can we create a new Jekyll site, like under /css? I'm getting error while generating site under that path. If that were not discussed previously, I might just file a bug and work from there.

I would also like to know is there any documentation or discussion restricting where can we create a new Jekyll site, like under /css? I'm getting error while generating site under that path. If that were not discussed previously, I might just file a bug and work from there.

This comment has been minimized.

@ashmaroli, just ran the same test in Windows and getting the same error, looks like same is happening to both platform. Following is the output for Windows. I'll paste the trace in issue if confirmed bug.

edited

Edited 1 time

kwokfu edited Sep 30, 2016 (most recent)

@ashmaroli, just ran the same test in Windows and getting the same error, looks like same is happening to both platform. Following is the output for Windows. I'll paste the trace in issue if confirmed bug.

This comment has been minimized.

@kwokfu, the bug you discovered seems to happen due to the presence of a subdirectory also named css. Deleting that directory resolves this edge-case-scenario.
Jekyll 3.3 no longer requires that directory to be present in the root. It'll be handled by the theme gem.

@kwokfu, the bug you discovered seems to happen due to the presence of a subdirectory also named css. Deleting that directory resolves this edge-case-scenario.
Jekyll 3.3 no longer requires that directory to be present in the root. It'll be handled by the theme gem.