[B! mapreduce] restartrのブックマーク

restartr id:restartr

mapreduceに関するrestartrのブックマーク (44)

Good night, Posterous
Posterous Spaces is no longer available Thanks to all of my @posterous peeps. Y'all made this a crazy ride and it was an honor and pleasure working with all of y'all. Thanks to all of the users. Thanks to the academy. Nobody will read this.
restartr 2011/10/26
mapreduce

concurrent

pararell

programming
リンク
http://strangeloop-riak-mapred.heroku.com/
restartr 2011/09/20
slide

strangeloop

riak

mapreduce
リンク
実践！「MapReduceでテキストマイニング」徹底解説
「青空文庫」をテキストマイニング！前回の「いまさら聞けないHadoopとテキストマイニング入門」では、Hadoopとテキストマイニングの概要や構成、MapReduceの仕組み、Hadoopの活用場面などを解説し、Hadoopの実行環境を構築しました。今回から、Hadoopを使い、テキストマイニングのMapReduceプログラムを作成していきます。「青空文庫」というサイトをご存じでしょうか。青空文庫は、著作権が切れた日本の文学作品を掲載しているWebサイトで、青空文庫の全データをDVDや、BitTorrentによる配信で入手できます。今回は、このデータを使ってテキストマイニングを行いましょう。前回、テキスト分類で、著者の性別、年齢、地域、職業などの属性も推定できると書きましたが、青空文庫は、他のデータにはない、著者属性があります。青空文庫の作品は、著作権が切れて、作者がなくなっている場
restartr 2011/07/21
hadoop

mapreduce

textmining
リンク
MapReduce以外の分散処理基盤BSP, Piccolo, Sparkの紹介 - Preferred Networks Research & Development
どうも，実は今年から開発チームにjoinしていた中川です．可愛い犬の写真がなかったので，可愛いマスコットの画像を貼っておきます．最近MapReduceとかその実装であるHadoopとかをよく聞くようになりました．これはつまり，それだけ大量のデータをなんとか処理したいという要望があるからだと思います．しかし当たり前ですが，MapReduceは銀の弾丸ではありません．ということで，最近気になっているMapReduceとは違ったアプローチを取っている分散処理基盤について，社内のTechTalkで話した内容を簡単にまとめて紹介したいと思います． Bulk Sychronous Parallel このアルゴリズム自体は1990年に誕生したものです．長いのでBSPと書きます．さて，グラフから最短経路を求める時，MapReduceは使えるでしょうか？このような論文が出るくらいですから出来ないことはあ
restartr 2011/06/21
DistributedComputing

mapreduce

bsp

piccolo

spark

scala
リンク
HbaseとHadoopMR - 急がば回れ、選ぶなら近道
Hbase勉強会のまとめの延長として今後の考え方をまとめておきます。まずは前提として <一般論> Hbaseにかぎらず、NoSQL系一般に言えることではあるが Usecaseを意識して利用する事が必要だ、ということだと思う。最近の傾向としては、Googleでも顕著だけど、一定の用途をターゲットにして特定のミドルを開発するという方法が結構多い。 Hbaseもその流れはあるので、そのあたりは意識する必要はあるかもしれない。 Hbaseついては、注目するとすればFacebookになるかな。 http://www.cloudera.com/resource/hw10_hbase_in_production_at_facebook いずれにしても、割とうまくいっているUsecaseの情報の有用性は他の技術よりも高いと思う。基本的に単純に分散KVSを使いたいならHbaseにこだわる必要
restartr 2011/06/20
hadoop

hbase

mapreduce
リンク
アクセスログをできるだけいろいろ見る時のmapreduce + ニフティクラウドでのパフォーマンス
アクセスログをできるだけいろいろ見る時のmapreduce + ニフティクラウドでのパフォーマンス { "author": "Muddy Dixon", "twitter": "@muddydixon", "place": "第2回 Mongo DB JP 勉強会 in Tokyo" } 自己紹介大学/大学院：自然言語処理：形態素列検索・置換システム言語発達の計算機シミュレーション(ElmanNet+SOMみたいな) 就職：検索エンジン(の広告最適化とかコンテンツマッチエンジンの設計開発とかエンジニアサポートの中の人とかデータマイニング部門個人取り組みサービスが死ぬほど多い&&老舗なので、近代的な解析基盤が無いサービスが死ぬほど多い&&老舗なので、ログフォーマットが素敵むりくり合わせたために、GETパラメタがｶｵｽになってたりする ?area=0013&bus=2&e
restartr 2011/04/11
mongodb

mapreduce

accesslog

dataanalysis
リンク
Disco MapReduce
Disco is a lightweight, open-source framework for distributed computing based on the MapReduce paradigm. Disco is powerful and easy to use, thanks to Python. Disco distributes and replicates your data, and schedules your jobs efficiently. Disco even includes the tools you need to index billions of data points and query them in real-time. Disco was born in Nokia Research Center in 2008 to solve rea
restartr 2011/03/10
python

mapreduce

framework
リンク
Large Scale Map-Reduce Data Processing at Quantcast
Beat the Plan: Probabilistic Strategies for Successful Software Delivery at Scale Large-scale software delivery demands managing complexity across teams and organizations. Similarly to betting strategies in Vegas, embracing probabilistic thinking helps tackle uncertainty, shifting from rigid plans to adaptive systems. By making informed bets and designing for change, leaders can control volatility
restartr 2010/12/24
mapreduce

quantcast

hadoop
リンク
MapReduce in Scala
この記事は Scala Advent Calendar jp 2010 の9日目です。と言いつつ空気を読まずにMapReduceやっちゃいますよ。簡易的にではありますが、GoogleやHadoopでおなじみ(?)のMapReduceフレームワークをScalaで実装してみました。というわけで、これを実装したときのポイントや便利な機能などを挙げていこうと思います。 MapReduceって？Googleが提唱した、シンプルかつ強力な大規模分散処理のためのプログラミングモデルです。 Hadoopというプロダクトがオープンソースで公開されていて、比較的容易に大規模分散処理を実現できるようになっています。詳しい説明は他のサイト（HadoopWikiとか@ITとかmapreduceの画像検索結果とか）に譲ります。実装( ソースコードはgistにも置いてあります。 ) mapreduce.sca
restartr 2010/12/15
scala

mapreduce

future
リンク
Hive/JoinOptimization - Hadoop Wiki
1. Map Join Optimization 1.1 Using Distributed Cache to Propagate Hashtable File Previously, when 2 large data tables need to do a join, there will be 2 different Mappers to sort these tables based on the join key and em it an intermediate file, and the Reducer will take the intermediate file as input file and do the real join work. This join mechanism (referred to as Common Join) is perfect with t
restartr 2010/12/08
hive

performance

mapreduce
リンク
http://jonhnnyweslley.net/blog/shadoop/
restartr 2010/10/24
*開発

hadoop

scala

mapreduce
リンク
機械学習 × MapReduce - ny23の日記
個人的な興味というより，雑用絡みで眺めた論文の紹介．機械学習アルゴリズムを並列分散化するという話が最近流行っているようだ．全然網羅的ではないけど，誰かの役に立つかも知れないので，幾つかメモしておく．まず古典的にはこれ， Map-reduce for machine learning on multicore (NIPS 2006) 古典的な機械学習アルゴリズム（バッチ学習）の多くは，Statistical Query Model で記述できて，それらは summation form で記述できる (から，MapReduce で並列化できる)．実装は Mahout．ただ最近は，バッチアルゴリズムで解ける問題には多くの場合対応するオンラインアルゴリズムが提案されていて，バッチアルゴリズムを並列化することのメリットはあまり無い．オンラインアルゴリズムだとパラメタが連続的に更新されるので，MapR
restartr 2010/10/07
*統計学

machinelearning

mapreduce

まとめ

機械学習
リンク
Good Freely Available Textbooks on Machine Learning - MetaOptimize Q+A
I'm starting this question to collect a list of good freely available textbooks that concentrate on some aspect of machine learning (rather than applications of ML -- these can go in separate questions). Good tutorials could be included as well, but I'd rather not have, say, slides that are a bit cryptic on their own. The Elements of Statistical Machine Learning is a great text covering most core
restartr 2010/09/29
machinelearning

books

まとめ

mapreduce

statistics
リンク
Designing algorithms for Map Reduce
Since the emerging of Hadoop implementation, I have been trying to morph existing algorithms from various areas into the map/reduce model. The result is pretty encouraging and I've found Map/Reduce is applicable in a wide spectrum of application scenarios. So I want to write down my findings but then found the scope is too broad and also I haven't spent enough time to explore different probl em dom
restartr 2010/09/07
"Finally, I realize that there is no way for me to completely cover what Map/Reduce can do in all areas" "Notice that Map/Reduce is good for "data parallelism", which is different from "task parallelism". "

*開発

hadoop

mapreduce

algorithm
リンク
How will Google's Dremel change future Hadoop releases?
Answer (1 of 7): The architecture of Dremel is quite similar to the architecture of Pig and Hive. All three have a columnar layout for persistent data [7], a metadata repository, a query execution engine, a physical planner, a logical planner, and a query parser for a higher-level query language....
restartr 2010/08/11
*サーバー

hadoop

google

demel

mapreduce
リンク
GitHub - bsdfish/ScalaHadoop: A wrapper for Hadoop in Scala
This code provides some syntactic sugar on top of Hadoop in order to make it more usa ble from Scala. Take a look at Examples.scala for more details. A basic mapper looks like object TokenizerMap extends TypedMapper[LongWritable, Text, Text, LongWritable] { override def map(k: LongWritable, v: Text, context: ContextType) : Unit = v split " \t" foreach ((word) => context.write(word, 1L)) } or, you c
restartr 2010/07/28
*開発

hadoop

mapreduce

scala
リンク
Cloud9: A MapReduce Library for Hadoop
This page describes the code used to run experiments in the following paper: Jimmy Lin and Michael Schatz. Design Patterns for Efficient Graph Algorithms in MapReduce. Proceedings of the 2010 Workshop on Mining and Learning with Graphs Workshop (MLG-2010), July 2010, Washington, D.C. There's code in Cloud9 that illustrates three different design patterns for graph algorithms in MapReduce using Pag
restartr 2010/07/26
*サーバー

hadoop

pagerank

mapreduce

algorithm
リンク
Graph Processing in Map Reduce
In my previous post about Google's Pregel model, a general pattern of parallel graph processing can be expressed as multiple iterations of processing until a termination condition is reached. Within each iteration, same processing happens at a set of nodes (ie: context nodes). Each context node perform a sequence of steps independently (hence achieving parallelism) Aggregate all incoming messages
restartr 2010/07/22
MapReduceのグラフ利用に関する諸々解決策としてのPregel。

graphDB

hadoop

mapreduce

nosql

pregel
リンク
http://www.sandia.gov/~sjplimp/mapreduce.html
restartr 2010/05/19
*サーバー

mpi

mapreduce

library
リンク
Riakについてちょと調べたよ - kuenishi's blog
Python Hackathon #3で、今手元で作っているモノのバックエンドに使えないかなぁと思ってRiakを調べてみたのでメモ。Riakは、bashoが作っているDynamoクローンにHTTP/JSONなインターフェースを出して、MapReduceもできるようにしたというキワモノ。でもConsistent HashingとMapReduceって激しく相性悪いと思うんだけどどうなっているんだろうという辺りが疑問点。とりあえずインターフェースはJSON/HTTPだけど、Erlang APIもある。 Riak's primary programming interface is JSON over (RESTful) HTTP, which is as close as you can come these days to a universal language and protocol
restartr 2010/05/19
実際にできることはCouchDB+Loungeとそんなに変わらない。Webmachineとかと組み合わせるとかなり強力になるんだろうと思う。

*サーバー

riak

nosql

mapreduce
リンク
1 2 3 次のページ