8000 REST - non-endpoint resource not created - Fix Identified · Issue #2795 · dlt-hub/dlt · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
REST - non-endpoint resource not created - Fix Identified 8000  #2795
Open
@wischmcj

Description

@wischmcj

dlt version

1.12.1

Describe the problem

This was encountered while implementing what is discussed in REST API Source > Define... not a REST endpoint.
The issue is in the method create_ build_resource_dependency_graph, which is primarily used in the course of creating rest API resources via dlt.sources.rest_api.init.rest_api_resources . The issue is that the function returns immediately after processing any resource of type DLTResource. This leads to none of the resources listed after that resource in the 'resources' configuration list being returned to the calling function. As rest_api_resources passes this result to create_resources, this means that the remaining resources are not created.
More specifically, 'build_resource_dependency_graph' breaks its loop after encountering such a resource when it should actually use the continue keyword to move on the the next resource in the list. See the below linked line of code:

This is evidenced by the contents of the resource list returned by 'rest_api_resources'

Expected behavior

All resources should be created, regardless of type *and it should allow for the inclusion of more than one non-endpoint resource. I mention the latter as, otherwise, one could just put the single non_rest resource at the end of the resources list and everything would work.

More specifically, 'build_resource_dependency_graph' breaks its loop after encountering such a resource when it should actually 'continue'. See the below linked line of code:

Steps to reproduce

Here is a slightly-more-than-minimal example:

import dlt
from dlt.sources.rest_api import RESTAPIConfig, rest_api_resources

@dlt.resource()
def processed_resource():
    yield [{"name": "dlt"}, {"name": "verified-sources"}, {"name": "dlthub-education"}]

@dlt.resource()
def not_processed_resource():
    yield [{"name": "dlt"}, {"name": "verified-sources"}, {"name": "dlthub-education"}]

@dlt.source
def github_source():
   config: RESTAPIConfig = {
      "client": {"base_url": "https://github.com/api/v2"},
      "resources": [
          {
              "name": "issues",
              "endpoint": {
                  "path": "dlt-hub/{resources.processed_resource.repository}/issues/",
                  "params": {
                      "repository": '{resources.processed_resource.name}'
                  },
              },
            "include_from_parent": ["repository", "name"],
  
          },
          processed_resource(),
          not_processed_resource(),
      ],
   }

   yield from rest_api_resources(config)

def load_github() -> None:
    pipeline = dlt.pipeline(
        pipeline_name="rest_api_github",
        destination="duckdb",
        dataset_name="rest_api_data",
    )
    my_source = github_source()
    print([x for x in my_source.resources.keys()])

load_github()
>>>> ['processed_resource', 'issues']

Operating system

Linux

Runtime environment

Local

Python version

3.11

dlt data source

REST API

dlt destination

DuckDB

Other deployment details

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0