欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

YARN常见异常

程序员文章站 2022-04-30 09:14:16
...

异常1:

2012-05-16 16:18:42,468 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Failed to launch container.

java.io.FileNotFoundException: File /tmp/nm-local-dir/usercache/a/appcache/application_1337150856633_0016 does not exist

        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:449)

        at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:864)

        at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)

        at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)

        at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:700)

        at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:697)

        at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2319)

        at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:697)

原因A:

yarn-site.xml中参数yarn.nodemanager.local-dirs和yarn.nodemanager.log-dirs使用

默认路径(/tmp/..),导致将磁盘空间撑爆

原因B:

org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor文件中,创建目录方法在默认情况下需要创建的目录父目录不存在则失败,具体代码如下

 

  @Override
  public int launchContainer(Container container,
      Path nmPrivateContainerScriptPath, Path nmPrivateTokensPath,
      String userName, String appId, Path containerWorkDir,
      List<String> localDirs, List<String> logDirs) throws IOException {

    ContainerId containerId = container.getContainerID();

    // create container dirs on all disks
    String containerIdStr = ConverterUtils.toString(containerId);
    String appIdStr =
        ConverterUtils.toString(
            container.getContainerID().getApplicationAttemptId().
                getApplicationId());
    for (String sLocalDir : localDirs) {
      Path usersdir = new Path(sLocalDir, ContainerLocalizer.USERCACHE);
      Path userdir = new Path(usersdir, userName);
      Path appCacheDir = new Path(userdir, ContainerLocalizer.APPCACHE);
      Path appDir = new Path(appCacheDir, appIdStr);
      Path containerDir = new Path(appDir, containerIdStr);
      lfs.mkdir(containerDir, null, false);
    }

 该方法第三个参数为false的时候,如果创建目录的父目录不存在,则调用失败,将其改为

 

lfs.mkdir(containerDir, null, true);

异常2:

2012-05-16 09:56:57,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer failed

java.lang.ArithmeticException: / by zero

        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:355)

        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)

        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)

        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)

        at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:257)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:854)

 

原因:

该错误是由于无法创建本地文件而产生的,可能会由上述磁盘空间不足导致,重定向上面提到的两个参数即可

异常3:

查看log没有明显的ERROR,但存在类似以下描述的日志

2012-05-16 13:08:20,876 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 18, cluster_timestamp: 1337134318909, }, attemptId: 1, }, id: 6, }, state: C_COMPLETE, diagnostics: "Container [pid=15641,containerID=container_1337134318909_0018_01_000006] is running beyond virtual memory limits. Current usage: 32.1mb of 1.0gb physical memory used; 6.2gb of 2.1gb virtual memory used. Killing container.\nDump of the process-tree for container_1337134318909_0018_01_000006 :\n\t|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE\n\t|- 15641 26354 15641 15641 (java) 36 2 6686339072 8207 /home/zhouchen.zm/jdk1.6.0_23/bin/java

 

原因:

该错误是YARN的虚拟内存计算方式导致,上例中用户程序申请的内存为1Gb,YARN根据此值乘以一个比例(默认为2.1)得出申请的虚拟内存的值,当YARN计算的用户程序所需虚拟内存值大于计算出来的值时,就会报出以上错误。调节比例值可以解决该问题。具体参数为:yarn-site.xml中的yarn.nodemanager.vmem-pmem-ratio