Description
dlt version
1.12.1
Describe the problem
This was encountered while implementing what is discussed in REST API Source > Define... not a REST endpoint.
The issue is in the method create_ build_resource_dependency_graph, which is primarily used in the course of creating rest API resources via dlt.sources.rest_api.init.rest_api_resources . The issue is that the function returns immediately after processing any resource of type DLTResource. This leads to none of the resources listed after that resource in the 'resources' configuration list being returned to the calling function. As rest_api_resources passes this result to create_resources, this means that the remaining resources are not created.
More specifically, 'build_resource_dependency_graph' breaks its loop after encountering such a resource when it should actually use the continue keyword to move on the the next resource in the list. See the below linked line of code:
dlt/dlt/sources/rest_api/config_setup.py
Line 309 in 3f2ed85
This is evidenced by the contents of the resource list returned by 'rest_api_resources'
Expected behavior
All resources should be created, regardless of type *and it should allow for the inclusion of more than one non-endpoint resource. I mention the latter as, otherwise, one could just put the single non_rest resource at the end of the resources list and everything would work.
More specifically, 'build_resource_dependency_graph' breaks its loop after encountering such a resource when it should actually 'continue'. See the below linked line of code:
dlt/dlt/sources/rest_api/config_setup.py
Line 309 in 3f2ed85
Steps to reproduce
Here is a slightly-more-than-minimal example:
import dlt
from dlt.sources.rest_api import RESTAPIConfig, rest_api_resources
@dlt.resource()
def processed_resource():
yield [{"name": "dlt"}, {"name": "verified-sources"}, {"name": "dlthub-education"}]
@dlt.resource()
def not_processed_resource():
yield [{"name": "dlt"}, {"name": "verified-sources"}, {"name": "dlthub-education"}]
@dlt.source
def github_source():
config: RESTAPIConfig = {
"client": {"base_url": "https://github.com/api/v2"},
"resources": [
{
"name": "issues",
"endpoint": {
"path": "dlt-hub/{resources.processed_resource.repository}/issues/",
"params": {
"repository": '{resources.processed_resource.name}'
},
},
"include_from_parent": ["repository", "name"],
},
processed_resource(),
not_processed_resource(),
],
}
yield from rest_api_resources(config)
def load_github() -> None:
pipeline = dlt.pipeline(
pipeline_name="rest_api_github",
destination="duckdb",
dataset_name="rest_api_data",
)
my_source = github_source()
print([x for x in my_source.resources.keys()])
load_github()
>>>> ['processed_resource', 'issues']
Operating system
Linux
Runtime environment
Local
Python version
3.11
dlt data source
REST API
dlt destination
DuckDB
Other deployment details
No response
Additional information
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status