In HDFS, reads normally go through the DataNode. Thus, when the client asks the DataNode to read a file, the DataNode reads that file off of the disk and sends the data to the client over a TCP socket. So-called “short-circuit” reads bypass the DataNode, allowing the client to read the file directly. Obviously, this is only possible in cases where the client is co-located with the data. Short-circuit reads provide a substantial performance boost to many applications
Setup
To configure short-circuit local reads, you will need to enable libhadoop.so. See Native Libraries for details on enabling this library. if you don’t compile yourself, may be use complile binary version ,see here Short-circuit reads make use of a UNIX domain socket. This is a special path in the filesystem that allows the client and the DataNodes to communicate. You will need to set a path to this socket. The DataNode needs to be able to create this path. On the other hand, it should not be possible for any user except the HDFS user or root to create this path. For this reason, paths under /var/run or /var/lib are often used.
The client and the DataNode exchange information via a shared memory segment on /dev/shm.
Short-circuit local reads need to be configured on both the DataNode and the client.
Java can not use Unix Domain Socket directly,so you need install Hadoop native package libhadoop.so。if you use Pivotal HD,CDH and so on, native package will be install at you install hadoop package. you can use command to check native package like this
Legacy implementation of short-circuit local reads on which the clients directly open the HDFS block files is still available for platforms other than the Linux. Setting the value of dfs.client.use.legacy.blockreader.local in addition to dfs.client.read.shortcircuit to true enables this feature.
You also need to set the value of dfs.datanode.data.dir.perm to 750 instead of the default 700 and chmod/chown the directory tree under dfs.datanode.data.dir as readable to the client and the DataNode. You must take caution because this means that the client can read all of the block files bypassing HDFS permission.
Because Legacy short-circuit local reads is insecure, access to this feature is limited to the users listed in the value of dfs.block.local-path-access.user.
private static final Logger LOG = LoggerFactory.getLogger(NativeCodeLoader.class); private static boolean nativeCodeLoaded = false; static { // Try to load native hadoop library and set fallback flag appropriately if(LOG.isDebugEnabled()) { LOG.debug("Trying to load the custom-built native-hadoop library..."); } try { System.loadLibrary("hadoop"); LOG.debug("Loaded the native-hadoop library"); nativeCodeLoaded = true; } catch (Throwable t) { // Ignore failure to load if(LOG.isDebugEnabled()) { LOG.debug("Failed to load native-hadoop with error: " + t); LOG.debug("java.library.path=" + System.getProperty("java.library.path")); } } if (!nativeCodeLoaded) { LOG.warn("Unable to load native-hadoop library for your platform... " + "using builtin-java classes where applicable"); } }
/** * Check if native-hadoop code is loaded for this platform. * * @return <code>true</code> if native-hadoop is loaded, * else <code>false</code> */ public static boolean isNativeCodeLoaded() { return nativeCodeLoaded; }
/** * Returns true only if this build was compiled with support for snappy. */ public static native boolean buildSupportsSnappy(); /** * Returns true only if this build was compiled with support for ZStandard. */ public static native boolean buildSupportsZstd();
/** * Returns true only if this build was compiled with support for openssl. */ public static native boolean buildSupportsOpenssl();
public static native String getLibraryName();
/** * Return if native hadoop libraries, if present, can be used for this job. * @param conf configuration * * @return <code>true</code> if native hadoop libraries, if present, can be * used for this job; <code>false</code> otherwise. */ public boolean getLoadNativeLibraries(Configuration conf) { return conf.getBoolean(CommonConfigurationKeys.IO_NATIVE_LIB_AVAILABLE_KEY, CommonConfigurationKeys.IO_NATIVE_LIB_AVAILABLE_DEFAULT); } /** * Set if native hadoop libraries, if present, can be used for this job. * * @param conf configuration * @param loadNativeLibraries can native hadoop libraries be loaded */ public void setLoadNativeLibraries(Configuration conf, boolean loadNativeLibraries) { conf.setBoolean(CommonConfigurationKeys.IO_NATIVE_LIB_AVAILABLE_KEY, loadNativeLibraries); }