-
Notifications
You must be signed in to change notification settings - Fork 24
SP-3354_delta-scan-setup #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new Delta class and a CLI "delta" subcommand group with a "copy" action that reads an input file of source paths, creates a delta directory (optionally under a root), copies listed files preserving structure, emits the delta path (stdout or file), and returns an exit status. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as "scanoss CLI"
participant Delta
participant FS as "File System / IO"
User->>CLI: run "scanoss delta copy" --filepath <file> [--folder <f>] [--root-dir <r>] [--output <o>]
CLI->>CLI: validate args (filepath required)
CLI->>Delta: instantiate Delta(debug, trace, quiet, filepath, folder, output, root_dir)
CLI->>Delta: call copy(input_file)
Delta->>FS: create_delta_dir(folder, root_dir)
Delta->>FS: read input file lines
loop each input path
Delta->>FS: normalize & validate source path
alt valid and inside delta root
Delta->>FS: makedirs destination
Delta->>FS: copy source -> delta destination
else skip / log warning
end
end
Delta-->>CLI: return (status_code, delta_folder)
CLI->>FS: write delta_folder to output (file or stdout)
CLI->>User: exit with status_code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/scanoss/cli.py (1)
2646-2691: Consider adding validation for the folder argument.The handler correctly validates the input file and initializes the output file. However, when
args.folderis provided, consider validating it's not an absolute path outside the working directory to prevent potential security issues.Additionally, the error message on line 2688 prints the raw exception object, which may not provide a user-friendly message. Consider formatting it:
- except Exception as e: - print_stderr(e) + except Exception as e: + print_stderr(f'ERROR: {e}')src/scanoss/delta.py (2)
30-49: Add class and method docstrings.The Delta class and
__init__method are missing descriptive docstrings. Comprehensive documentation helps users understand the purpose and usage of the class.Add docstrings:
class Delta(ScanossBase): """ - + Manages delta directory creation and file copying operations. + + Provides functionality to copy files listed in an input file to a target + directory while preserving directory structure, supporting delta/incremental + scanning workflows. """ def __init__( self, debug: bool = False, trace: bool = False, quiet: bool = False, filepath: str = None, folder: str = None, output: str = None, ): """ - + Initialize the Delta instance. + + :param debug: Enable debug logging + :param trace: Enable trace logging + :param quiet: Enable quiet mode (suppress non-error output) + :param filepath: Path to input file containing list of files to copy + :param folder: Optional target directory for delta files + :param output: Optional output file to write delta directory path """
101-108: Simplify the conditional logic.Line 105 uses
exist_ok=Trueimmediately after checking that the folder doesn't exist on line 101, which is redundant. Additionally, consider usingos.path.join()instead of string concatenation for the prefix in tempfile creation for better cross-platform compatibility.if folder and os.path.exists(folder): self.print_stderr(f'Folder {folder} already exists') return '' elif folder: - os.makedirs(folder, exist_ok=True) + os.makedirs(folder) else: folder = tempfile.mkdtemp(prefix="delta-", dir='.') return folder
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/cli.py(5 hunks)src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (2)
ScanossBase(28-107)print_to_file_or_stdout(83-94)
src/scanoss/cli.py (2)
src/scanoss/delta.py (2)
Delta(30-108)copy(51-88)src/scanoss/scanossbase.py (1)
print_stderr(45-49)
🪛 GitHub Actions: Lint
src/scanoss/delta.py
[error] 24-24: I001 Import block is un-sorted or un-formatted. Organize imports. 2 issues fixable with --fix.
src/scanoss/cli.py
[error] 25-25: I001 Import block is un-sorted or un-formatted. Organize imports. 2 issues fixable with --fix.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (4)
src/scanoss/cli.py (2)
923-959: LGTM!The delta subcommand structure follows the established patterns in the codebase (e.g., component, utils, crypto). The command hierarchy (
delta copy) is well-defined with appropriate aliases, help text, and argument configuration.
980-980: LGTM!The additions correctly integrate the
p_copyparser into the existing option groups, ensuring consistency with other subcommands.Also applies to: 1178-1178
src/scanoss/delta.py (2)
63-65: LGTM!The input file validation correctly checks for existence before proceeding.
24-26: Fix import sorting issue.The linter has flagged the import block as improperly sorted. Organize imports with standard library modules sorted alphabetically.
Apply this diff:
import os import shutil import tempfileNote: The imports appear correctly ordered visually. Run
ruff check --fixto apply the auto-fix suggested by the linter, which may address whitespace or other formatting concerns.Likely an incorrect or invalid review comment.
src/scanoss/cli.py
Outdated
| from .spdxlite import SpdxLite | ||
| from .threadeddependencies import SCOPE | ||
| from .utils.file import validate_json_file | ||
| from scanoss.delta import Delta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix import sorting issue.
The import statement is flagged by the linter as being out of order. Following Python conventions, imports should be organized by standard library, third-party, and local modules, sorted alphabetically within each group.
Apply this diff to fix the import order:
+from .scantype import ScanType
+from .spdxlite import SpdxLite
from .threadeddependencies import SCOPE
from .utils.file import validate_json_file
-from scanoss.delta import Delta
+
+from .delta import Delta📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from scanoss.delta import Delta | |
| from .scantype import ScanType | |
| from .spdxlite import SpdxLite | |
| from .threadeddependencies import SCOPE | |
| from .utils.file import validate_json_file | |
| from .delta import Delta |
🤖 Prompt for AI Agents
In src/scanoss/cli.py around line 87, the import "from scanoss.delta import
Delta" is out of order per lint rules; reorder imports into standard library,
third-party, then local modules and alphabetically within each group—place this
local package import with other local imports in the correct alphabetical
position (or move it below third-party imports) so import sorting conforms to
the project's linter.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/scanoss/cli.py (1)
36-36: Import ordering issue remains unresolved.The previous review comment about import ordering has not been addressed. The
from scanoss.delta import Deltaimport should be placed with other local imports in alphabetical order.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/cli.py(5 hunks)src/scanoss/delta.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scanoss/delta.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/cli.py (2)
src/scanoss/delta.py (2)
Delta(31-109)copy(52-89)src/scanoss/scanossbase.py (1)
print_stderr(45-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (2)
src/scanoss/cli.py (2)
980-980: LGTM!Correctly adds the
p_copyparser to the common options lists for output and debugging flags, following the established pattern.Also applies to: 1178-1178
2646-2691: LGTM!The
delta_copyhandler correctly:
- Validates required inputs
- Initializes optional output file
- Instantiates Delta with CLI arguments
- Unpacks the (status, folder) tuple from Delta.copy()
- Handles exceptions with appropriate error reporting
The implementation follows established patterns from similar handlers.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/cli.py(6 hunks)src/scanoss/delta.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scanoss/cli.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (2)
ScanossBase(28-107)print_to_file_or_stdout(83-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (1)
src/scanoss/delta.py (1)
1-28: LGTM!License header and imports are appropriate for the functionality.
…tside the delta dir
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/scanoss/delta.py (2)
85-134: Strong improvements to security and error handling!The implementation now includes:
- UTF-8 encoding specification
- Comprehensive error handling
- Path normalization and sanitization
- Final safety check to prevent directory escapes
The path traversal concerns from previous reviews have been substantially addressed.
Optional simplification: The manual while loop at lines 112-113 for stripping leading
..components is functional but could be simplified:- # Use a normalised source for destination to prevent traversal - # Remove leading path separators and '..' components from destination - safe_dest_path = normalised_source.lstrip(os.sep).lstrip('/') - while safe_dest_path.startswith('..'): - safe_dest_path = safe_dest_path[2:].lstrip(os.sep).lstrip('/') + # Use a normalised source for destination to prevent traversal + # Remove leading path separators and ensure relative path + safe_dest_path = normalised_source.lstrip(os.sep).lstrip('/') + # Reject any remaining paths that start with '..' + if safe_dest_path.startswith('..'): + self.print_stderr(f'WARNING: Path traversal detected in {source_file}, skipping') + continueAlternatively, you could use
os.path.relpathto ensure the path is relative, though the current approach is adequate.
88-88: Consider usingstrip()for line processing.Using
strip()instead ofrstrip('\n\r')would be more robust and handle all types of whitespace and line endings (including potential leading whitespace).- source_file = line.rstrip('\n\r') + source_file = line.strip()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (2)
ScanossBase(28-107)print_to_file_or_stdout(83-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (2)
src/scanoss/delta.py (2)
31-61: Excellent documentation!The class and constructor docstrings are now comprehensive and clearly document the purpose, parameters, and behavior. This addresses the previous review feedback effectively.
137-163: Well-structured directory management!The method properly handles all scenarios:
- Validates existing folders before creation
- Includes comprehensive error handling for both explicit and temporary directory creation
- Clear error messages guide users when operations fail
The previous review feedback regarding redundant
exist_ok=Truehas been successfully addressed.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/cli.py(6 hunks)src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (3)
ScanossBase(28-107)print_to_file_or_stdout(83-94)print_trace(65-70)
src/scanoss/cli.py (2)
src/scanoss/delta.py (2)
Delta(31-189)copy(65-127)src/scanoss/scanossbase.py (1)
print_stderr(45-49)
🪛 GitHub Actions: Lint
src/scanoss/delta.py
[error] 39-39: PLR0913 Too many arguments in function definition (7 > 6).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (5)
src/scanoss/cli.py (5)
923-949: LGTM!The delta subcommand structure is well-organized and follows the established patterns in the codebase. The argument definitions are clear and appropriate for the copy operation.
970-970: Correct integration of output options.The
p_copyparser is properly added to the list of commands that support the--outputflag.
1168-1168: Correct integration of debug options.The
p_copyparser is properly added to the list of commands that support debug, trace, and quiet flags.
1189-1190: Correct validation for delta subcommand.The validation now properly includes 'delta' and 'dl', ensuring that users are prompted with help when running the command without a subcommand.
2637-2680: Well-structured handler function.The
delta_copyhandler follows the established patterns in the codebase:
- Validates required arguments
- Properly initializes output files
- Delegates business logic to the Delta class
- Includes appropriate exception handling with debug tracebacks
src/scanoss/delta.py
Outdated
| def __init__( | ||
| self, | ||
| debug: bool = False, | ||
| trace: bool = False, | ||
| quiet: bool = False, | ||
| filepath: str = None, | ||
| folder: str = None, | ||
| output: str = None, | ||
| root_dir: str = '.', | ||
| ): | ||
| """ | ||
| Initialise the Delta instance. | ||
| :param debug: Enable debug logging | ||
| :param trace: Enable trace logging | ||
| :param quiet: Enable quiet mode (suppress non-essential output) | ||
| :param filepath: Path to input file containing list of files to copy | ||
| :param folder: Target delta directory path (auto-generated if not provided) | ||
| :param output: Output file path for delta directory location (stdout if not provided) | ||
| """ | ||
| super().__init__(debug, trace, quiet) | ||
| self.filepath = filepath | ||
| self.folder = folder | ||
| self.output = output | ||
| self.root_dir = root_dir if root_dir else '.' | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider consolidating parameters to address linter warning.
The linter flags this function for having too many arguments (7 > 6). While not critical, you could optionally consolidate related parameters (e.g., debug, trace, quiet are often bundled in a config object) to improve maintainability.
Minor: Redundant fallback logic.
Line 63 has self.root_dir = root_dir if root_dir else '.', but the parameter already defaults to '.' on line 47. The fallback is unnecessary unless None can be explicitly passed.
🧰 Tools
🪛 GitHub Actions: Lint
[error] 39-39: PLR0913 Too many arguments in function definition (7 > 6).
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/scanoss/delta.py (1)
64-64: Remove redundant fallback logic.Since
root_diralready defaults to'.'in the parameter definition (line 48), the conditional assignment is unnecessary.Apply this diff:
- self.root_dir = root_dir if root_dir else '.' + self.root_dir = root_dir
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/delta.py (2)
src/scanoss/scanossbase.py (3)
ScanossBase(28-107)print_to_file_or_stdout(83-94)print_trace(65-70)src/scanoss/cli.py (1)
print_stderr(92-96)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (2)
src/scanoss/delta.py (2)
1-38: LGTM! Well-documented class definition.The license header, imports, and class docstring are clear and comprehensive. The docstring effectively explains the class purpose and behavior.
130-191: LGTM! Robust error handling and security practices.Both methods demonstrate solid implementation:
create_delta_dir: Properly validates folder existence, handles errors, and creates temporary directories with appropriate error catching.load_input_file: Uses explicit UTF-8 encoding, strips whitespace correctly, and removes leading separators (line 185) to prevent absolute path exploitation.The comprehensive error handling and return of
Noneon failure enables proper error propagation to callers.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/cli.py(6 hunks)src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/scanoss/cli.py (2)
src/scanoss/delta.py (2)
Delta(31-191)copy(66-128)src/scanoss/scanossbase.py (1)
print_stderr(45-49)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (3)
ScanossBase(28-107)print_to_file_or_stdout(83-94)print_trace(65-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (10)
src/scanoss/cli.py (6)
36-36: Import placement looks good.The Delta import is correctly placed in the local imports section and properly ordered alphabetically with other scanoss module imports.
923-949: Well-structured subcommand setup.The delta subcommand group and copy subcommand follow the established patterns in the codebase. The argument definitions are clear and properly typed, with appropriate help text for users.
970-970: Proper integration with output options.Adding
p_copyto the output options loop ensures the delta copy command supports the standard--outputflag consistently with other commands.
1168-1168: Correct integration with debug options.Including
p_copyin the debug/trace/quiet options loop ensures consistent logging behavior across all commands.
1189-1190: Validation properly updated.The addition of 'delta' and 'dl' to the subcommand validation check correctly addresses the past review concern. This ensures proper help display when the subcommand is omitted (e.g.,
scanoss-py delta).
2637-2673: Well-implemented command handler.The
delta_copyhandler follows the established pattern used by other inspection and processing commands in the codebase. It properly validates inputs, initializes resources, handles errors, and provides appropriate exit codes.src/scanoss/delta.py (4)
96-128: Comprehensive security implementation.The file copying logic includes multiple layers of path traversal protection:
- Path normalization and validation for '..' sequences
- Verification that sources exist within the expected root
- Final safety check ensuring destinations remain within the delta directory
The consistent use of
self.root_dirin bothdest_pathconstruction (line 109) andabs_folderresolution (line 112) ensures proper path alignment, addressing the critical concern from past reviews.
130-163: Robust directory creation logic.The
create_delta_dirmethod properly handles both explicit folder creation and automatic temporary directory generation. Error handling is comprehensive, and the method correctly returnsNoneon failure to allow the caller to detect and handle errors appropriately.
165-191: Proper file loading with security considerations.The
load_input_filemethod correctly:
- Specifies UTF-8 encoding for cross-platform compatibility
- Uses
rstrip()to handle various line ending formats- Strips leading path separators to neutralize absolute paths from the input file
- Includes proper error handling
This implementation addresses the security and robustness concerns raised in past reviews.
31-191: Well-architected Delta class with strong security posture.The Delta class implementation demonstrates good software engineering practices:
- Clear separation of concerns across methods
- Comprehensive error handling and validation
- Multiple layers of security checks against path traversal attacks
- Consistent logging and debugging support
- Proper integration with the ScanossBase parent class
The implementation successfully addresses all critical and major issues raised in previous reviews.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
src/scanoss/delta.py (5)
82-84: Validate the input is a file, not just a path that exists.- if not os.path.exists(input_file): + if not os.path.isfile(input_file): self.print_stderr(f'ERROR: Input file {input_file} does not exist') return 1, ''
24-28: Clean type hints: use Optional and standard generics.@@ -import os +import os import shutil import tempfile +from typing import Optional, List @@ - def create_delta_dir(self, folder: str, root_dir: str = '.') -> str or None: + def create_delta_dir(self, folder: str, root_dir: str = '.') -> Optional[str]: @@ - def load_input_file(self, input_file: str) -> list[str] or None: + def load_input_file(self, input_file: str) -> Optional[List[str]]:Also applies to: 130-131, 165-166
94-95: Emit an absolute delta path for clarity/consumers.- self.print_to_file_or_stdout(delta_folder, self.output) + self.print_to_file_or_stdout(os.path.abspath(os.path.join(self.root_dir, delta_folder)), self.output)
118-127: Optionally continue on per-file copy errors to process remaining entries.Current behavior aborts the entire run on first copy error. Consider logging and continuing, returning non‑zero if any failure occurred.
- except (OSError, shutil.Error) as e: - self.print_stderr(f'ERROR: Failed to copy {source_file} to {dest_path}: {e}') - return 1, '' + except (OSError, shutil.Error) as e: + self.print_stderr(f'ERROR: Failed to copy {source_file} to {dest_path}: {e}') + had_errors = True @@ - return 0, delta_folder + return (1 if 'had_errors' in locals() and had_errors else 0), delta_folder
180-186: Normalize both path separators when stripping leading separators.On Windows, inputs may use '/' or ''. Strip both for consistency, or defer all sanitization to the copy() normalization.
- if source_file: - # Save the file path without any leading separators - files.append(source_file.lstrip(os.sep)) + if source_file: + # Save path without any leading separators (handle '/' and '\') + leading = '/\\' if os.sep == '\\' else os.sep + files.append(source_file.lstrip(leading))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (3)
ScanossBase(28-107)print_to_file_or_stdout(83-94)print_trace(65-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
| for source_file in files: | ||
| # Normalise the source path to handle ".." and redundant separators | ||
| normalised_source = os.path.normpath(source_file) | ||
| if '..' in normalised_source: | ||
| self.print_stderr(f'WARNING: Source path escapes root directory for {source_file}. Skipping.') | ||
| continue | ||
| # Resolve to the absolute path for source validation | ||
| abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source)) | ||
| # Check if the source file exists and is a file | ||
| if not os.path.exists(abs_source) or not os.path.isfile(abs_source): | ||
| self.print_stderr(f'WARNING: File {source_file} does not exist or is not a file, skipping') | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden path validation: reject absolute/drive-qualified paths and block symlink escapes; use realpath for copy.
Prevents cross-platform escapes (Windows drive letters) and copying through symlinks that resolve outside root_dir.
@@
- for source_file in files:
- # Normalise the source path to handle ".." and redundant separators
- normalised_source = os.path.normpath(source_file)
- if '..' in normalised_source:
- self.print_stderr(f'WARNING: Source path escapes root directory for {source_file}. Skipping.')
- continue
- # Resolve to the absolute path for source validation
- abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source))
- # Check if the source file exists and is a file
- if not os.path.exists(abs_source) or not os.path.isfile(abs_source):
- self.print_stderr(f'WARNING: File {source_file} does not exist or is not a file, skipping')
- continue
+ for source_file in files:
+ # Normalize the source path to handle ".." and redundant separators
+ normalised_source = os.path.normpath(source_file)
+ # Reject absolute or drive-qualified paths (cross‑platform)
+ drive, _ = os.path.splitdrive(normalised_source)
+ if os.path.isabs(normalised_source) or drive:
+ self.print_stderr(f'WARNING: Absolute or drive-qualified path {source_file} not allowed. Skipping.')
+ continue
+ # Reject traversal that escapes the root
+ if normalised_source == '..' or normalised_source.startswith('..' + os.sep):
+ self.print_stderr(f'WARNING: Source path escapes root directory for {source_file}. Skipping.')
+ continue
+ # Resolve to the absolute path for source validation
+ abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source))
+ # Resolve symlinks and ensure the real path stays within root_dir
+ real_root = os.path.realpath(self.root_dir)
+ real_source = os.path.realpath(abs_source)
+ if not real_source.startswith(real_root + os.sep):
+ self.print_stderr(f'WARNING: Resolved source {real_source} escapes root directory. Skipping.')
+ continue
+ # Check if the resolved source is a regular file
+ if not os.path.isfile(real_source):
+ self.print_stderr(f'WARNING: File {source_file} does not exist or is not a file, skipping.')
+ continue
@@
- self.print_debug(f'Copying {source_file} to {dest_path} ...')
- shutil.copy(abs_source, dest_path)
+ self.print_debug(f'Copying {source_file} to {dest_path} ...')
+ shutil.copy(real_source, dest_path)Also applies to: 118-126
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/scanoss/delta.py (1)
96-107: Harden source-path validation (absolute/drive paths, traversal, symlinks) and copy the real target.Current check
if '..' in normalised_sourcecan false‑positive and misses absolute/drive-qualified paths; symlink escapes aren’t blocked. Use segment-based traversal checks, reject abs/drive paths, verify resolved real path stays withinroot_dir, and copy fromreal_source.for source_file in files: # Normalise the source path to handle ".." and redundant separators normalised_source = os.path.normpath(source_file) - if '..' in normalised_source: - self.print_stderr(f'WARNING: Source path escapes root directory for {source_file}. Skipping.') - continue - # Resolve to the absolute path for source validation - abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source)) - # Check if the source file exists and is a file - if not os.path.exists(abs_source) or not os.path.isfile(abs_source): - self.print_stderr(f'WARNING: File {source_file} does not exist or is not a file, skipping') - continue + # Reject absolute or drive-qualified paths (cross-platform) + drive, _ = os.path.splitdrive(normalised_source) + if os.path.isabs(normalised_source) or drive: + self.print_stderr(f'WARNING: Absolute or drive-qualified path {source_file} not allowed. Skipping.') + continue + # Reject traversal that escapes the root (segment-aware) + if normalised_source == '..' or normalised_source.startswith('..' + os.sep): + self.print_stderr(f'WARNING: Source path escapes root directory for {source_file}. Skipping.') + continue + # Resolve and validate within root_dir, following symlinks + real_root = os.path.realpath(self.root_dir) + abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source)) + real_source = os.path.realpath(abs_source) + if not real_source.startswith(real_root + os.sep): + self.print_stderr(f'WARNING: Resolved source {real_source} escapes root directory. Skipping.') + continue + # Check that the resolved source is a regular file + if not os.path.isfile(real_source): + self.print_stderr(f'WARNING: File {source_file} does not exist or is not a file, skipping') + continue @@ - shutil.copy(abs_source, dest_path) + shutil.copy(real_source, dest_path)Also applies to: 124-124
🧹 Nitpick comments (5)
src/scanoss/delta.py (5)
53-59: Docstring: documentroot_dir.
__init__acceptsroot_dirbut the docstring doesn’t document it. Add a brief description.:param output: Output file path for the delta directory location (stdout if not provided). + :param root_dir: Root directory under which the delta directory is created and from which + relative source paths are resolved.
64-64: Normalizeroot_dironce to avoid CWD drift.Make
root_dirabsolute during init to prevent subtle path mix-ups later.- self.root_dir = root_dir if root_dir else '.' + self.root_dir = os.path.abspath(root_dir) if root_dir else os.getcwd()
82-84: Stronger input validation: ensure it’s a regular file.
exists()allows directories. Useisfile()for clarity.- if not os.path.exists(input_file): + if not os.path.isfile(input_file): self.print_stderr(f'ERROR: Input file {input_file} does not exist') return 1, ''
95-128: Don’t abort on first copy failure; continue and return non‑zero if any errors.Fail-fast stops processing good files. Continue, log errors, and reflect a nonzero exit code at the end.
- # Process each file and copy it to the delta dir - for source_file in files: + # Process each file and copy it to the delta dir + had_errors = False + for source_file in files: @@ except (OSError, shutil.Error) as e: self.print_stderr(f'ERROR: Failed to copy {source_file} to {dest_path}: {e}') - return 1, '' - return 0, delta_folder + had_errors = True + continue + return (1, delta_folder) if had_errors else (0, delta_folder)
183-189: Trim robustly and strip both separators for cross‑platform inputs.Handle CRLF and leading whitespace with
strip(), and strip both '/' and '' to neutralize absolute paths in mixed inputs.- for line in f: - source_file = line.rstrip() + for line in f: + source_file = line.strip() if source_file: # Save the file path without any leading separators - files.append(source_file.lstrip(os.sep)) + files.append(source_file.lstrip('/\\'))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (3)
ScanossBase(28-107)print_to_file_or_stdout(83-94)print_trace(65-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
| if folder: | ||
| # Resolve a relative folder under root_dir so checks/creation apply to the right place | ||
| resolved = folder if os.path.isabs(folder) else os.path.join(root_dir, folder) | ||
| resolved = os.path.normpath(resolved) | ||
| # Validate the target directory doesn't already exist and create it | ||
| if os.path.exists(resolved): | ||
| self.print_stderr(f'ERROR: Folder {resolved} already exists.') | ||
| return None | ||
| else: | ||
| try: | ||
| self.print_debug(f'Creating delta directory {resolved}...') | ||
| os.makedirs(resolved) | ||
| except (OSError, IOError) as e: | ||
| self.print_stderr(f'ERROR: Failed to create directory {resolved}: {e}') | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: create_delta_dir returns the un-resolved folder when provided; return the resolved path.
When folder is relative, you create <root_dir>/<folder> but return folder, which makes printed/output path inconsistent with the actual directory used.
if folder:
# Resolve a relative folder under root_dir so checks/creation apply to the right place
resolved = folder if os.path.isabs(folder) else os.path.join(root_dir, folder)
resolved = os.path.normpath(resolved)
@@
try:
self.print_debug(f'Creating delta directory {resolved}...')
os.makedirs(resolved)
except (OSError, IOError) as e:
self.print_stderr(f'ERROR: Failed to create directory {resolved}: {e}')
return None
+ # Ensure we return the resolved path
+ folder = resolved
@@
- return os.path.normpath(folder)
+ return os.path.normpath(folder)Also applies to: 166-166
🤖 Prompt for AI Agents
In src/scanoss/delta.py around lines 142 to 156 (and also at line 166), the
function resolves a relative folder into a full path (root_dir + folder) and
creates that directory, but still returns the original un-resolved folder which
causes inconsistent/incorrect paths; change the function to return the resolved
(normalized) path variable instead of the original folder variable and ensure
any subsequent return at line 166 also returns the resolved path so callers
always receive the actual created directory path.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
src/scanoss/delta.py (3)
96-107: Add symlink validation to prevent directory escape.The current validation doesn't resolve symlinks, which means a symbolic link could point outside
root_dirand bypass the path safety checks.Apply this diff to resolve and validate symlinks:
# Resolve to the absolute path for source validation abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source)) + # Resolve symlinks and ensure the real path stays within root_dir + real_root = os.path.realpath(self.root_dir) + real_source = os.path.realpath(abs_source) + if not real_source.startswith(real_root + os.sep) and real_source != real_root: + self.print_stderr(f'WARNING: Resolved source {real_source} escapes root directory. Skipping.') + continue # Check if the source file exists and is a file - if not os.path.exists(abs_source) or not os.path.isfile(abs_source): + if not os.path.isfile(real_source): self.print_stderr(f'WARNING: File {source_file} does not exist or is not a file, skipping') continueThen update line 124 to copy the resolved path:
self.print_debug(f'Copying {source_file} to {dest_path} ...') - shutil.copy(abs_source, dest_path) + shutil.copy(real_source, dest_path)
96-107: Refine path traversal check and add absolute path rejection.Line 99's substring check for
'..'will incorrectly reject legitimate filenames like"foo..bar.txt". The check should verify that the normalized path starts with'..'rather than merely containing it.Additionally, absolute paths in the input should be explicitly rejected before normalization to prevent confusion.
Apply this diff:
for source_file in files: # Normalise the source path to handle ".." and redundant separators normalised_source = os.path.normpath(source_file) + # Reject absolute paths + if os.path.isabs(normalised_source): + self.print_stderr(f'WARNING: Absolute path {source_file} not allowed. Skipping.') + continue - if '..' in normalised_source: + # Reject paths that escape the root directory + if normalised_source == '..' or normalised_source.startswith('..' + os.sep): self.print_stderr(f'WARNING: Source path escapes root directory for {source_file}. Skipping.') continue # Resolve to the absolute path for source validation abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source))
142-168: CRITICAL: Return the resolved path, not the original folder argument.When a relative
folderis provided, the code resolves it to an absolute path (line 144), creates the directory at that resolved location (line 153), but then returns the originalfoldervariable at line 168 instead of the resolved path. This creates an inconsistency where callers receive a path that doesn't match the actual directory that was created.Impact: The
copymethod receives an incorrect path fromcreate_delta_dir, which could lead to files being copied to the wrong location or path validation failures.Apply this diff:
if folder: # Resolve a relative folder under root_dir so checks/creation apply to the right place resolved = folder if os.path.isabs(folder) else os.path.join(root_dir, folder) resolved = os.path.normpath(resolved) # Validate the target directory doesn't already exist and create it if os.path.exists(resolved): self.print_stderr(f'ERROR: Folder {resolved} already exists.') return None else: try: self.print_debug(f'Creating delta directory {resolved}...') os.makedirs(resolved) except (OSError, IOError) as e: self.print_stderr(f'ERROR: Failed to create directory {resolved}: {e}') return None + # Update folder to the resolved path for consistent return + folder = resolved else: # Create a unique temporary directory in the given root directory try: self.print_debug(f'Creating temporary delta directory in {root_dir} ...') folder = tempfile.mkdtemp(prefix="delta-", dir=root_dir) if folder: folder = folder.removeprefix(root_dir).lstrip(os.sep) # Strip the root from the temp folder self.print_debug(f'Created temporary delta directory: {folder}') except (OSError, IOError) as e: self.print_stderr(f'ERROR: Failed to create temporary directory in {root_dir}: {e}') return None return folder
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (3)
ScanossBase(28-107)print_to_file_or_stdout(83-94)print_trace(65-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
| return 1, '' | ||
| return 0, delta_folder | ||
|
|
||
| def create_delta_dir(self, folder: str, root_dir: str = '.') -> str or None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use proper union type syntax for return annotations.
The return type str or None is not valid type annotation syntax. Use str | None (Python 3.10+) or Optional[str] from typing.
Apply this diff:
- def create_delta_dir(self, folder: str, root_dir: str = '.') -> str or None:
+ def create_delta_dir(self, folder: str, root_dir: str = '.') -> str | None:🤖 Prompt for AI Agents
In src/scanoss/delta.py around line 130, the return annotation uses invalid
syntax ("str or None"); change the signature to use a proper union type like
"Optional[str]" or "str | None" (prefer Optional[str] for older Python
compatibility), and if you choose Optional[str] add "from typing import
Optional" to the module imports; update the function signature accordingly and
ensure any type-checking or docstrings reflect the new annotation.
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
src/scanoss/delta.py (4)
131-131: Fix type hint: use a proper union for the return type.- def create_delta_dir(self, folder: str, root_dir: str = '.') -> str or None: + def create_delta_dir(self, folder: str, root_dir: str = '.') -> Optional[str]:
143-170: Return the resolved directory path when a folder is provided.Currently you create “resolved” but return the original “folder”, which can be relative and inconsistent with the actual on-disk path and the printed location. Return the resolved path.
@@ if folder: # Resolve a relative folder under root_dir so checks/creation apply to the right place resolved = folder if os.path.isabs(folder) else os.path.join(root_dir, folder) resolved = os.path.normpath(resolved) @@ else: try: self.print_debug(f'Creating delta directory {resolved}...') os.makedirs(resolved) except (OSError, IOError) as e: self.print_stderr(f'ERROR: Failed to create directory {resolved}: {e}') return None + # Ensure we return the resolved path + folder = resolved @@ - return folder + return os.path.normpath(folder)
97-108: Block absolute/drive-qualified inputs and symlink escapes; refine traversal check.Current logic allows drive-qualified paths on Windows and follows symlinks that resolve outside root_dir. The traversal check also false-positives on filenames containing “..”.
Harden validation and use realpath for enforcement.
@@ - for source_file in files: - # Normalise the source path to handle ".." and redundant separators - normalised_source = os.path.normpath(source_file) - if '..' in normalised_source: - self.print_stderr(f'WARNING: Source path escapes root directory for {source_file}. Skipping.') - continue - # Resolve to the absolute path for source validation - abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source)) - # Check if the source file exists and is a file - if not os.path.exists(abs_source) or not os.path.isfile(abs_source): - self.print_stderr(f'WARNING: File {source_file} does not exist or is not a file, skipping') - continue + for source_file in files: + # Normalize and harden against traversal/absolute/drive-qualified inputs + normalised_source = os.path.normpath(source_file) + drive, _ = os.path.splitdrive(normalised_source) + if os.path.isabs(normalised_source) or drive: + self.print_stderr(f'WARNING: Absolute or drive-qualified path {source_file} not allowed. Skipping.') + continue + if normalised_source == '..' or normalised_source.startswith('..' + os.sep): + self.print_stderr(f'WARNING: Source path escapes root directory for {source_file}. Skipping.') + continue + # Resolve real paths and ensure source stays within root_dir (blocks symlink escapes) + real_root = os.path.realpath(self.root_dir) + abs_source = os.path.abspath(os.path.join(self.root_dir, normalised_source)) + real_source = os.path.realpath(abs_source) + if not real_source.startswith(real_root + os.sep): + self.print_stderr(f'WARNING: Resolved source {real_source} escapes root directory. Skipping.') + continue + if not os.path.isfile(real_source): + self.print_stderr(f'WARNING: File {source_file} does not exist or is not a file, skipping') + continue
124-125: Copy from the resolved real path to avoid symlink surprises.- self.print_debug(f'Copying {source_file} to {dest_path} ...') - shutil.copy(abs_source, dest_path) + self.print_debug(f'Copying {source_file} to {dest_path} ...') + shutil.copy(real_source, dest_path)
🧹 Nitpick comments (4)
src/scanoss/delta.py (4)
110-117: Ensure dest_path can’t be misread as absolute due to alt separators.Also strip alt separators when building the destination component.
- dest_path = os.path.normpath(os.path.join(self.root_dir, delta_folder, normalised_source.lstrip(os.sep))) + dest_path = os.path.normpath( + os.path.join(self.root_dir, delta_folder, normalised_source.lstrip('/\\')) + )
171-197: Normalize input lines and strip both separators cross‑platform.Handle CRLF/leading whitespace robustly and remove both “/” and “\” as leading separators.
@@ - with open(input_file, 'r', encoding='utf-8') as f: + with open(input_file, 'r', encoding='utf-8') as f: for line in f: - source_file = line.rstrip() + source_file = line.strip() if source_file: # Save the file path without any leading separators - files.append(source_file.lstrip(os.sep)) + leading_seps = os.sep + (os.path.altsep or '') + files.append(source_file.lstrip(leading_seps))
95-129: Don’t abort on first copy failure; continue and return partial success code.Fail-fast leaves a half-populated delta; prefer copying as much as possible and reporting a non-zero status if any error occurred.
@@ - # Process each file and copy it to the delta dir - for source_file in files: + # Process each file and copy it to the delta dir + errors = 0 + for source_file in files: @@ - except (OSError, shutil.Error) as e: - self.print_stderr(f'ERROR: Failed to copy {source_file} to {dest_path}: {e}') - return 1, '' - return 0, delta_folder + except (OSError, shutil.Error) as e: + self.print_stderr(f'ERROR: Failed to copy {source_file} to {dest_path}: {e}') + errors += 1 + continue + return (1 if errors else 0), delta_folder
41-50: Optional: consolidate flags into a config to satisfy linter PLR0913.Consider a small config dataclass or a single options dict for debug/trace/quiet/output to reduce parameter count.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/delta.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/scanoss/delta.py (1)
src/scanoss/scanossbase.py (3)
ScanossBase(28-107)print_to_file_or_stdout(83-94)print_trace(65-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
|
Hi @Alex-1089 could you add a brief guide on what/how i should test this pr? |
Summary by CodeRabbit
New Features
Documentation