Creating a Job

Create and run a job in Data Science.

Ensure that you have created the necessary policies, authentication, and authorization for your jobs.

Before you begin:

  • Create a job artifact file or build a custom container.

  • To store and manage job logs, learn about logging.

  • To use storage mounts, you must have an Object Storage bucket or OCI File Storage Service (FSS) mount target and export path.

    To use FSS, you must first create the file system and the mount point. Use the custom networking option and ensure that the mount target and the notebook are configured with the same subnet. Configure security list rules for the subnet with the specific ports and protocols.

    Ensure that service limits are allocated to file-system-count and mount-target-count.

  • To use storage mounts, you must have an Object Storage bucket or OCI File Storage Service (FSS) mount point.

  • Add basic information for the job you're creating.
    1. From the jobs list page, select Create job. If you need help finding the list of jobs, see Listing Jobs.
    2. Select Single Node if using a single machine for the job or Multi Node for demanding jobs that are to be run over several nodes.
    3. (Optional) Select a different compartment for the job.
    4. (Optional) Enter a name and description for the job (limit of 255 characters). If you don't provide a name, a name is automatically generated.

      For example, job20210808222435

    5. Single or multi node configuration steps
    Using the Console for Single Node Jobs
    Using the Console for Multi Node Jobs
  • These environment variables control the job.

    Use the Data Science CLI to create a job as in this example:

    1. Create a job with:
      oci data-science job create \
      --display-name <job_name>\
      --compartment-id <compartment_ocid>\
      --project-id <project_ocid> \
      --configuration-details file://<jobs_configuration_json_file> \
      --infrastructure-configuration-details file://<jobs_infrastructure_configuration_json_file> \
      --log-configuration-details file://<optional_jobs_infrastructure_configuration_json_file>
    2. Use this jobs configuration JSON file:
      {
        "jobType": "DEFAULT",
        "maximumRuntimeInMinutes": 240,
        "commandLineArguments" : "test-arg",
        "environmentVariables": {
          "SOME_ENV_KEY": "some_env_value" 
        }
      }
    3. Use this jobs infrastructure configuration JSON file:
      {
        "jobInfrastructureType": "STANDALONE",
        "shapeName": "VM.Standard2.1",
        "blockStorageSizeInGBs": "50",
        "subnetId": "<subnet_ocid>"
      }
    4. (Optional) Use this jobs logging configuration JSON file:
      {
        "enableLogging": true,
        "enableAutoLogCreation": true,
        "logGroupId": "<log_group_ocid>"
      }
    5. Upload a job artifact file for the job you created with:
      oci data-science job create-job-artifact \
      --job-id <job_ocid> \
      --job-artifact-file <job_artifact_file_path> \
      --content-disposition "attachment; filename=<job_artifact_file_name>"
  • The ADS SDK is also a publicly available Python library that you can install with this command:

    pip install oracle-ads

    It provides the wrapper that makes the creation and running jobs from notebooks or on your client machine easy.

    Use the ADS SDK to create and run jobs.