8000 rifkiamil (Rif) · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View rifkiamil's full-sized avatar

Block or report rifkiamil

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rifkiamil/README.md
  • 👋 Hi, I’m @rifkiamil
  • 👀 I’m interested in ... data, databases and Enterprise IT
  • 🐮 Build websites with 🐮 faces http://🐮🐮🐮.ws
  • 🌱 I’m currently learning ... python, node.js
  • 💪 Slowly building app to understand data from Strong App
  • 💞️ I’m teaching as a Google Developer Expert, I teach and educate minorities and marginalised groups on Data, SQL & Google BigQuery using sports and blockchain data with the support of Google.
  • 📫 How to reach me ... @rifkiamil on twitter or email rif atsymbol kiamil.com

Pinned Loading

  1. DATASET bigquery-public-data DB cryp... DATASET bigquery-public-data DB crypto_ethereum - quickdatabasediagrams
    1
    "bigquery-public-data.crypto_ethereum.balances" as Bal
    2
    -
    3
    address STRING
    4
    eth_balance NUMERIC
    5
    
                  
  2. Satoshi Nakamoto sent 50 BTC to Hal ... Satoshi Nakamoto sent 50 BTC to Hal Finney in block 170
    1
    # Find Bitcoin transaction 
    2
    SELECT
    3
      bt.hash,
    4
      bt.block_timestamp,
    5
      CAST(bi.value AS NUMERIC)/100000000 as InputValueBTC,
  3. SQL Bitcoin Part 2 - Find transactio... SQL Bitcoin Part 2 - Find transactions with block_timestamp, block_timestamp_month, output_value WHERE block_timestamp_month
    1
    SELECT block_timestamp, block_timestamp_month, output_value
    2
    FROM `bigquery-public-data.crypto_bitcoin.transactions`
    3
    WHERE block_timestamp_month = '2013-12-01'
  4. Google Sign-In with Google Play Cons... Google Sign-In with Google Play Console and Firebase - configuration problem related to SHA-1 fingerprints
    1
    Rrocess to ensure that both your debug and release builds (including those distributed via Google Play) are correctly recognized by Firebase for Google Sign-In. Option Reading Might want to read [SHA1 key hell regarding Gmail authentication and sign-in]([https://pages.github.com/](https://mvolkanyurtseven.medium.com/sha1-key-hell-regarding-gmail-authentication-and-sign-in-e710baef5071)).
    2
    
                  
    3
    ---
    4
    
                  
    5
    ## Overview
  5. Trino JSON Processing in Data Lakes:... Trino JSON Processing in Data Lakes: Engine Internals, Comparisons, and SIMD Aspects
    1
    # Trino JSON Processing in Data Lakes: Engine Internals, Comparisons, and SIMD Aspects
    2
    
                  
    3
    ## Engine Internals and JSON Processing Enhancements
    4
    
                  
    5
    **Trino’s JSON architecture:** Trino (formerly PrestoSQL) is a distributed MPP query engine where workers scan data in parallel and pipeline results in memory. JSON in a data lake (e.g. files on S3 or HDFS) is typically handled via the Hive connector, which treats JSON files as *line-oriented* text. Each JSON object (or array) is expected to be a record – often one JSON per line (NDJSON). Trino splits large JSON files into segments for parallel reading, aligning splits on record boundaries (usually newline delimited) so that no JSON object is cut in half between workers. This ensures each split contains whole JSON records for valid parsing. Internally, Trino uses a **LinePageSource** to read text files and find record boundaries (e.g. newline positions) so that each worker thread reads a chunk of the file and emits complete JSON rows. For extremely large JSON objects that might span multiple lines, Trino’s reader will treat them as a single record (by reading past split bounds until the closing brace is found), guaranteeing correctness across split boundaries. The result is that JSON files can be scanned in parallel by many workers, each processing a different byte range of the file, which greatly increases throughput in a data lake environment.
  6. Trino's Efficiency in Processing JSO... Trino's Efficiency in Processing JSON Files in Data Lakes: A Technical Deep Dive
    1
    # **Trino's Efficiency in Processing JSON Files in Data Lakes: A Technical Deep Dive**
    2
    
                  
    3
    ## **Introduction**
    4
    
                  
    5
    Trino, the distributed SQL query engine formerly known as PrestoSQL, is engineered for high-performance, interactive analytics across a multitude of heterogeneous data sources.1 Its architecture is particularly well-suited for querying large datasets residing in data lakes, whether deployed on-premises using HDFS or in the cloud on object storage systems like Amazon S3, Google Cloud Storage, or Azure Blob Storage.2 A key capability enabling this is Trino's schema-on-read approach, allowing users to query data in various formats directly where it resides, without requiring upfront transformation and loading into a proprietary storage system.3
0