[B! presto] yassのブックマーク

yass id:yass

prestoに関するyassのブックマーク (27)

Presto ベースのマネージドサービス Amazon Athena
yass 2017/08/26
Athena

presto
リンク
A Benchmark Test on Presto, Spark Sql and Hive on Tez
Presto、Spark SQLとHive on Tezの性能に関して、数万件から数十億件までのデータ上に、常用クエリパターンの実行スピードなどを検証してみた。 We conducted a benchmark test on mainstream big data sql engines including Presto, Spark SQL, Hive on Tez. We focused on the performance over medium data (from tens of GB to 1 TB) which is the major case used in most services. Read less
yass 2016/11/26
presto

Spark SQL

tez

Hive

benchmark

hadoop
リンク
Presto anatomy
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It is written in Java and uses a pluggable backend. Presto is fast due to code generation and runtime compilation techniques. It provides a library and framework for building distributed services and fast Java collections. Plugins all
yass 2015/09/23
presto
リンク
Presto雑感 - wyukawa's diary
約1年間Prestoを運用していて気づいたことを書いてみようと思う。 Prestoが素晴らしいOSSプロダクトであることは間違いなくて、Hiveを使っている人はインストールして損は無いと思う。メリットは下記の通り Hiveに比べるとオンメモリで処理するので高速でアドホッククエリに向いている安定している。ストレージを持たないアーキテクチャなのでアップデートが簡単開発が活発。最近は以前に比べるとバージョンアップのスピードは落ちてきたがそれでも3週間に1回はバージョンアップしている。バグ報告すると数日で修正されたバージョンがリリースされる。開発がオープン。pull requestも受け付けておりコードレビューが丁寧コードが奇麗でモダンJavaの代表だと勝手に思ってる最近の変更を見る限りPrestoは安定性を重視しているように見え、これは僕のような管理者にとっては運用負荷が少なくな
yass 2015/09/09
presto

hadoop
リンク
SQL on Hadoop 比較検証【2014月11日における検証レポート】
Impala Meetup 2014/10/31 @Tokyo 講演資料【注意事項】本資料で紹介している検証結果は2014年当時のものです。当該ソフトウェアは成長や改善が早く、現時点のバージョンでは大きく異なる機能や性能となっています。 SQL on Hadoopの最新情報に基づくサービスやシステムインテグレーションにご興味をお持ちの方は、NTTデータ基盤システム事業本部 OSSプロフェッショナルサービス（電子メール： hadoop [AT] kits.nttdata.co.jp）にご相談ください。Read less
yass 2014/11/05
Hadoop

benchmark

comparison

hive

Impala

sql

presto

impala

tez
リンク
SQL on Hadoop in Taiwan
This document discusses SQL engines for Hadoop, including Hive, Presto, and Impala. Hive is best for batch jobs due to its stability. Presto provides interactive queries across data sources and is easier to manage than Hive with Tez. Presto's distributed architecture allows queries to run in parallel across nodes. It supports pluggable connectors to access different data stores and has language bi
yass 2014/09/27
presto

hadoop

sql
リンク
PrestoとかAnsibleとかその辺の話を軽く書いてみる - wyukawa's diary
今日はPrestoとかAnsibleとかその辺の話を軽く書いてみようと思います。突っ込んだ話が出来るわけではないのであしからず。僕のところの環境ではPrestoを使っていて、PrestoはDataNodeやNodeManagerと同居してます。主なユースケースはアドホッククエリの実行です。とあるレポートを作りたいってなったときにデータの中身をチェックするのに使います。従来だとこれがHiveだったのですが、HiveだとMapReduceになって遅いので（ローカルモードで済む場合もあるけど）、その点Prestoは早くていいです。ただこれは僕の環境がスモールデータだからっていうのもあって、圧縮済み数百GBのデータに対してselectかけるとかだとPrestoといえども遅くなると思います。あとなにげに良いのがPresto CLI経由だとカラム名が表示されるのでどのデータがどのカラムなのかすぐ分か
yass 2014/08/03
" 以前は集計用RDBMSは必要かなあと思ってたんですけど、集計データを単純にselectするようなケースだったらPrestoでも十分速いので集計用RDBMSは無くてもいいかもって思い始めてます。"

ansible

presto

hive

hadoop
リンク
War of the Hadoop SQL engines. And the winner is ...? - Sonra
War of the Hadoop SQL engines. And the winner is …? You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we wil
yass 2014/07/28
" Right now I would run both batch style queries (ETL) and interactive queries on Hive Tez as Hive offers the richest SQL feature set, especially analytic functions and supports a wide set of file formats. "

hadoop

sql

hive

tez

impala

presto

spark

infinidb

drill
リンク
MPP on Hadoop, Redshift, BigQuery - Go ahead!
Twitterで「早く今流行のMPPの大まかな使い方の違い書けよ！」というプレッシャーが半端ないのでてきとうに書きます．この記事は俺の経験と勉強会などでユーザから聞いた話をもとに書いているので，すべてが俺の経験ではありません(特にBigQuery)．各社のSAの人とかに聞けば，もっと良いアプローチとか詳細を教えてくれるかもしれません．オンプレミスの商用MPPは使ったことないのでノーコメントです． MPP on HadoopでPrestoがメインなのは今一番使っているからで，Impalaなど他のMPP on Hadoop的なものも似たような感じかなと思っています．もちろん実装の違いなどがあるので，その辺は適宜自分で補間してください．前提アプリケーションを開発していて，そのための解析基盤を一から作る．簡単なまとめデータを貯める所が作れるのであれば，そこに直接クエリを投げられるPre
yass 2014/07/24
BigQuery

RedShift

Impala

Presto

Hadoop

mpp
リンク
Cloudera Blog
We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scala ble AI inference. Background The generative AI landscape is evolving […] Read blog post
yass 2014/05/30
" Shark required more memory than available in the cluster to run the Reporting and Deep Analytics queries on RDDs (and thus those queries could not be completed) "

impala

hive

tez

shark

spark

presto

parquet

orcfile

benchmark

hadoop
リンク
Read Data in Parquet File Format by zhenxiao · Pull Request #1147 · prestodb/presto
yass 2014/05/16
presto

parquet
リンク
Netflix running Presto in the AWS Cloud
Netflix runs Presto in its AWS cloud environment to enable low-latency ad-hoc queries on petabyte-scale data stored in S3. Some key things Netflix did include optimizing Presto to read from and write directly to S3, fixing bugs, integrating Presto with its EMR and Ganglia monitoring, and deploying a 100+ node Presto cluster that handles over 1000 queries per day. Performance testing showed Presto
yass 2014/05/16
Netflix

presto

SequenceFile

parquet

hadoop
リンク
Presto in the cloud
yass 2014/05/16
Qubole

presto

hadoop

cloud
リンク
Announcing General Availability of Presto-as-a-Service | Qubole
yass 2014/04/30
Qubole

hadoop

presto

prestodb
リンク
Real-time Analytics Database | CrateDB
Execute ad-hoc queries on billions of records in milliseconds. Columnar storage guarantees ultra-fast aggregations, enabling instant data-driven decisions. Begin with a simple query and delve into complex data relationships, revealing trends and patterns across diverse data types. Hybrid Search Effortless search across structured, semi-structured, geospatial, and vector data. Perform full-text, ve
yass 2014/04/19
" Crate Data is a distributed system that runs on one machine or a cluster of machines. Crate comes in one complete install package. It includes solid established open source components (Presto, Elasticsearch, Lucene, Netty) "

sql

presto

netty

lucene

elasticsearch

distributed
リンク
Presto Performance - Qubole Engineering Posts - Quora
Presto is an open source distributed SQL query engine, developed by Facebook. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses. Qubole started its Presto-as-a-Service program a few weeks ago to make it easily acces...
yass 2014/04/14
" Presto showed a speedup of 2-7.5x over Hive for these queries. "

presto

hadoop

hive

Qubole
リンク
What are the main differences between Facebook, Presto, and Amplab Shark?
Answer (1 of 2): 1. Primary Use Case: While both are intended for analytics, Shark's primary use case is providing SQL to an (extremely fast) in-memory database, with support also for on-disk (or abstract) data sources. Presto is designed to be a fast SQL engine for the latter, and does not have ...
yass 2014/02/14
" Presto has implemented some approximate aggregation operators with hard-coded characteristics (99% confidence intervals, fixed sampling, see BlinkDB "

presto

shark

spark

impala

comparison

blinkdb
リンク
Presto-as-a-Service:AWSでのインタラクティブなSQL実行
Spring BootによるAPIバックエンド構築実践ガイド第2版何千人もの開発者が、InfoQのミニブック「Practical Guide to Building an API Back End with Spring Boot」から、Spring Bootを使ったREST API構築の基礎を学んだ。この本では、出版時に新しくリリースされたバージョンである Spring Boot 2 を使用している。しかし、Spring Boot3が最近リリースされ、重要な変...
yass 2014/02/10
" 既にGithub上では2,000のスターが付き、350のフォークがあり、Impalaのような同種のプロジェクトよりも人気になっている。"

presto

hadoop

qubole
リンク
https://www.xtendsys.net/blog/post-1
yass 2014/02/09
presto

facebook

hadoop

hive
リンク
Hardware requirements for Presto
Most people are running Trino (formerly PrestoSQL) on the Hadoop nodes they already have. At Facebook we typically run Presto on a few nodes within the Hadoop cluster to spread out the network load. Generally, I'd go with the industry standard ratios for a new cluster: 2 cores and 2-4 gig of memory for each disk, with 10 gigabit networking if you can afford it. After you have a few machines (4+),
yass 2014/01/24
" At Facebook / We run our JVMs with a 16 gigabyte heap to leave most memory available for OS buffers / On the machines we run Presto we don't run MapReduce tasks / Most of the Presto machines we are on have 16 real cores and we use processor affinity to limit Presto to 12 cores "

presto

Facebook

hardware

server
リンク
1 2 次のページ