Home - HADOOP2 - Apache Software Foundation


本站和网页 https://cwiki.apache.org/confluence/display/HADOOP2/ 的作者无关,不对其内容负责。快照谨为网络故障时之索引,不代表被搜索网站的即时页面。

Home - HADOOP2 - Apache Software Foundation
This Confluence instance will be upgraded Monday 26th December 2022. Save your work!
Skip to content
Skip to breadcrumbs
Skip to header menu
Skip to action menu
Skip to quick search
Linked ApplicationsLoading…Apache Software Foundation
Spaces
Hit enter to search
Help
Online Help
Keyboard Shortcuts
Feed Builder
What鈥檚 new
What鈥檚 new
Available Gadgets
About Confluence
Log in
Sign up
HADOOP2PagesBlogPage tree
Browse pagesConfigureSpace tools
Attachments (0)
Page History
Resolved comments
Page Information
View in Hierarchy
View Source
Delete comments
Export to PDF
Export to EPUB
Export to Word
Export to Markdown
Copy Page Tree
Pages
Skip to end of banner
Jira links
Go to start of banner
Home
Skip to end of metadata
Created by ASF Infrabot, last modified by Akira Ajisaka on Dec 10, 2021
Go to start of metadata
This HADOOP2 space was migrated from old Hadoop wiki. Please check https://cwiki.apache.org/confluence/display/HADOOP for the current information.Apache HadoopApache Hadoop is a framework for running applications on large cluster built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System are designed so that node failures are automatically handled by the framework. General InformationOfficial Apache Hadoop Website: download, bug-tracking, mailing-lists, etc.Overview of Apache HadoopFAQ Frequently Asked Questions.What Hadoop is notDistributions and Commercial Support for Hadoop (RPMs, Debs, AMIs, etc)Presentations, books, articles and papers about HadoopPoweredBy, a growing list of sites and applications powered by Apache HadoopSupportGetting help from the hadoop community.People and companies for hire.Hadoop Community Events and ConferencesHadoopUserGroups (HUGs)Related-ProjectsHBase, a Bigtable-like structured storage system for Hadoop HDFSApache Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.Hive a data warehouse infrastructure which allows sql-like adhoc querying of data (in any format) stored in HadoopZooKeeper is a high-performance coordination service for distributed applications.Hama, a Google's Pregel-like distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations.Mahout, scalable Machine Learning algorithms using HadoopHadoop Compatible FileSystems (HCFS)Apache Gora, open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support.User DocumentationAvailable Java Runtime Environments for HadoopImportantConceptsGettingStartedWithHadoop (lots of details and explanation)QuickStart (for those who just want it to work now)Command Line Options for the Hadoop shell scripts.Hadoop Code OverviewTroubleshooting What do when things go wrongSetting up a Hadoop ClusterStarting a Single-Node Hadoop ClusterHowToConfigure Hadoop softwareWebApps for monitoring your systemConfigure NameNode High-AvailabilityHow to get metrics into gangliaTips for managing a large clusterDisk Setup: some suggestionsTopology Scripts / Rack AwarenessBuild and Install Hadoop 2.2 or newer on WindowsVirtual Clusters including Amazon AWSVirtual Hadoop - the theoryHow to set up a Virtual ClusterRunning Hadoop on AmazonEC2Running Hadoop with AmazonS3 TutorialsRunning Hadoop On Ubuntu Linux (Single-Node_Cluster) Tutorial by Michael Noll on installing, configuring and running Hadoop on a single Ubuntu Linux machine.Running Hadoop On Ubuntu Linux (Multi-Node Cluster) Tutorial by Michael Noll on how to setup a multi-node Hadoop cluster.Cloudera basic trainingHadoop Windows/Eclipse Tutorial: How to develop Hadoop with Eclipse on Windows.Yahoo! Hadoop Tutorial: Hadoop setup, HDFS, and MapReduceRunning Hadoop on Mac OSX (Multi-Node Cluster) Tutorial on how to setup a multi-node Hadoop cluster on Macintosh OSX (Lion). MapReduceThe MapReduce algorithm is the foundational algorithm of Hadoop, and is critical to understand.HadoopMapReduceHadoopMapRedClassesHowManyMapsAndReducesTaskExecutionEnvironmentHowToDebugMapReduceProgramsExamplesWordCountPython Word CountC/C++ Word CountGrepSortRandomWriterHow to read from and write to HDFSBenchmarksHardware benchmarksData processing benchmarks Contributed parts of the Hadoop codebaseThese are independent modules that are in the Hadoop codebase but not tightly integrated with the main project -yet.HadoopStreaming (Useful for using Hadoop with other programming languages)DistributedLucene, a Proposal for a distributed Lucene index in HadoopMountableHDFS, Fuse-DFS & other Tools to mount HDFS as a standard filesystem on Linux (and some other Unix OSs)HDFS-APIs in Perl, Python, PHP and other languages.Chukwa a data collection, storage, and analysis frameworkThe Apache Hadoop Plugin for Eclipse (An Eclipse plug-in that simplifies the creation and deployment of MapReduce programs with an HDFS Administrative feature)HDFS-RAID Erasure Coding in HDFSDeveloper DocumentationRoadmap, listing release plans.HowToContributeHowToUseInjectionFrameworkHowToUseSystemTestFrameworkHowToSetupYourDevelopmentEnvironmentHowToUseConcurrencyAnalysisToolsGithubIntegrationHowToUseJCarderHowToCodeReviewJira usage guidelinesHowToCommitHowToReleaseHudsonBuildServerHowToSetupUbuntuBuildMachineDevelopmentHintsProjectSuggestionsBuilding/Testing under IntelliJ IDEAGit And HadoopProjectSplitRelated ResourcesNutch Hadoop Tutorial (Useful for understanding Hadoop in an application context)IBM MapReduce Tools for Eclipse - Out of date. Use the Eclipse Plugin in the MapReduce/Contrib insteadHadoop IRC channel is #hadoop at irc.freenode.net.Using Spring and Hadoop (Discussion of possibilities to use Hadoop and Dependency Injection with Spring)Univa Grid Engine Integration A blog post about the integration of Hadoop with the Grid Engine successor Univa Grid EngineHadoop Grid Engine Integration Open Grid Scheduler/Grid Engine Hadoop integration setup instructions.Hadoop Tutorial Series Learning progressively important core Hadoop concepts with hands-on experiments using the Cloudera Virtual MachinePydoop A Python MapReduce and HDFS API for Hadoop (tutorial).Dumbo Dumbo is a project that allows you to easily write and run Hadoop programs in Python.Hadoop distributed file system New Hadoop Connector Enables Ultra-Fast Transfer of Data between Hadoop and Aster Data's MPP Data Warehouse.Hadoop + CUDAHadoop on ARM cluster A study that compares Hadoop MapReduce applications' energy consumption and performance between ARM cluster and general X86_64 clusterHDFS Architecture Documentation An overview of the HDFS architecture, intended for contributors.CategoryHomepage
No labels
Overview
Content Tools
Apps
Powered by a free Atlassian Confluence Open Source Project License granted to Apache Software Foundation. Evaluate Confluence today.
Powered by Atlassian Confluence 7.13.8
Printed by Atlassian Confluence 7.13.8
Report a bug
Atlassian News
Atlassian
{"serverDuration": 359, "requestCorrelationId": "d09184e7d4ff235a"}