Describe the bug
When using the following:
PARENT_TUNER = HyperparameterTuner.attach(
tuning_job_name = PARENT_TUNING_JOB_NAME
)
...on a tuning job where its job definition has:
...
"StoppingCondition": {
"MaxRuntimeInSeconds": 3600,
"MaxWaitTimeInSeconds": 7200
},
"EnableNetworkIsolation": false,
"EnableInterContainerTrafficEncryption": false,
"EnableManagedSpotTraining": true
...
The max_wait and use_spot_instances setting are both None. I traced back to:
|
def _prepare_init_params_from_job_description(cls, job_details, model_channel_name=None): |
|
"""Convert the job description to init params that can be handled by the |
|
class constructor |
|
|
|
Args: |
|
job_details: the returned job details from a describe_training_job |
|
API call. |
|
model_channel_name (str): Name of the channel where pre-trained |
|
model data will be downloaded. |
|
|
|
Returns: |
|
dictionary: The transformed init_params |
|
""" |
|
init_params = dict() |
|
|
|
init_params["role"] = job_details["RoleArn"] |
|
init_params["instance_count"] = job_details["ResourceConfig"]["InstanceCount"] |
|
init_params["instance_type"] = job_details["ResourceConfig"]["InstanceType"] |
|
init_params["volume_size"] = job_details["ResourceConfig"]["VolumeSizeInGB"] |
|
init_params["max_run"] = job_details["StoppingCondition"]["MaxRuntimeInSeconds"] |
|
init_params["input_mode"] = job_details["AlgorithmSpecification"]["TrainingInputMode"] |
|
init_params["base_job_name"] = base_from_name(job_details["TrainingJobName"]) |
|
init_params["output_path"] = job_details["OutputDataConfig"]["S3OutputPath"] |
|
init_params["output_kms_key"] = job_details["OutputDataConfig"]["KmsKeyId"] |
|
if "EnableNetworkIsolation" in job_details: |
|
init_params["enable_network_isolation"] = job_details["EnableNetworkIsolation"] |
|
|
|
has_hps = "HyperParameters" in job_details |
|
init_params["hyperparameters"] = job_details["HyperParameters"] if has_hps else {} |
|
|
|
if "AlgorithmName" in job_details["AlgorithmSpecification"]: |
|
init_params["algorithm_arn"] = job_details["AlgorithmSpecification"]["AlgorithmName"] |
|
elif "TrainingImage" in job_details["AlgorithmSpecification"]: |
|
init_params["image_uri"] = job_details["AlgorithmSpecification"]["TrainingImage"] |
|
else: |
|
raise RuntimeError( |
|
"Invalid AlgorithmSpecification. Either TrainingImage or " |
|
"AlgorithmName is expected. None was found." |
|
) |
|
|
|
if "MetricDefinitons" in job_details["AlgorithmSpecification"]: |
|
init_params["metric_definitions"] = job_details["AlgorithmSpecification"][ |
|
"MetricsDefinition" |
|
] |
|
|
|
if "EnableInterContainerTrafficEncryption" in job_details: |
|
init_params["encrypt_inter_container_traffic"] = job_details[ |
|
"EnableInterContainerTrafficEncryption" |
|
] |
|
|
|
subnets, security_group_ids = vpc_utils.from_dict(job_details.get(vpc_utils.VPC_CONFIG_KEY)) |
|
if subnets: |
|
init_params["subnets"] = subnets |
|
if security_group_ids: |
|
init_params["security_group_ids"] = security_group_ids |
|
|
|
if "InputDataConfig" in job_details and model_channel_name: |
|
for channel in job_details["InputDataConfig"]: |
|
if channel["ChannelName"] == model_channel_name: |
|
init_params["model_channel_name"] = model_channel_name |
|
init_params["model_uri"] = channel["DataSource"]["S3DataSource"]["S3Uri"] |
|
break |
|
|
|
return init_params |
It seems use_spot_instances and max_wait do not get carried over to newly created estimator.
To reproduce
See above.
Expected behavior
use_spot_instances and max_wait etc. should all be carried over to newly attach()ed tuner. This also affects warm start helpers like identical_data_and_algorithm().
If applicable, add screenshots or logs to help explain your problem.
**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: v2.0.0
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**:
- **Framework version**:
- **Python version**:
- **CPU or GPU**:
- **Custom Docker image (Y/N)**: N, official image classification image
**Additional context**
Add any other context about the problem here.
Describe the bug
When using the following:
...on a tuning job where its job definition has:
The
max_waitanduse_spot_instancessetting are bothNone. I traced back to:sagemaker-python-sdk/src/sagemaker/estimator.py
Lines 811 to 874 in 481719f
It seems
use_spot_instancesandmax_waitdo not get carried over to newly created estimator.To reproduce
See above.
Expected behavior
use_spot_instancesandmax_waitetc. should all be carried over to newlyattach()ed tuner. This also affects warm start helpers likeidentical_data_and_algorithm().