Uploaded image for project: 'OpenVZ'
  1. OpenVZ
  2. OVZ-5337

[JAVA & CPUS=1] FUTEX_WAKE and FUTEX_CLOCK_REALTIME broken / wait forever?

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: OpenVZ-legacy
    • Component/s: Containers::Kernel
    • Security Level: Public
    • Environment:
      Operating System: All
      Platform: x86_64 (AMD64)

      Description

      Hi,

      i've seen this bug very often using java based applications in VPS using OpenVZ. Sadly i wasn't able to provide an example with public aivalable software in the past.

      Now i am.

      1st some details:
      1.) CT0 / Hostsystem is running Debian lenny with a custom configured RHEL6 2.6.32-049.5 based kernel
      2.) CT100 is running Debian Squeeze

      I'm able to reproduce these FUTEX hangs using a the minecraft_server.jar. I can reproduce this with OpenJDK and Sun Java JRE (different versions tested).

      I'm starting the minecraft server with:
      ~# java -Xms512M -Xmx850M -jar minecraft_server.jar nogui

      then it seems to start fine but at some point preparing spawn area it just stops. This point is different on every start.

      Example output:
      177 recipes
      27 achievements
      2012-03-05 18:58:19 [INFO] Starting minecraft server version 1.2.3
      2012-03-05 18:58:19 [INFO] Loading properties
      2012-03-05 18:58:19 [INFO] Starting Minecraft server on X:25565
      2012-03-05 18:58:19 [WARNING] **** SERVER IS RUNNING IN OFFLINE/INSECURE MODE!
      2012-03-05 18:58:19 [WARNING] The server will make no attempt to authenticate usernames. Beware.
      2012-03-05 18:58:19 [WARNING] While this makes the game possible to play without internet access, it also opens up the ability for hackers to connect with any username they choose.
      2012-03-05 18:58:19 [WARNING] To change this, set "online-mode" to "true" in the server.settings file.
      2012-03-05 18:58:19 [INFO] Preparing level "Lets Mine Together"
      2012-03-05 18:58:19 [INFO] Default game type: 0
      2012-03-05 18:58:20 [INFO] Preparing start region for level 0
      2012-03-05 18:58:21 [INFO] Preparing spawn area: 12%
      2012-03-05 18:58:22 [INFO] Preparing spawn area: 16%
      2012-03-05 18:58:23 [INFO] Preparing spawn area: 24%
      2012-03-05 18:58:24 [INFO] Preparing spawn area: 32%
      2012-03-05 18:58:25 [INFO] Preparing spawn area: 40%
      2012-03-05 18:58:26 [INFO] Preparing spawn area: 52%
      2012-03-05 18:58:27 [INFO] Preparing spawn area: 65%
      2012-03-05 18:58:28 [INFO] Preparing spawn area: 81%


      As strace just shows this:
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 492738000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 542959000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 593345000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 643570000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 694108000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 744438000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 794845000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 845246000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 895815000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
      [pid 5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 946122000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
      [pid 5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0

      Why do i think this is a bug?
      This is plain simple i've seen exactly this behaviour on totally different java based apps. For example on Amazons AMTU server too. Also this work absolutely fine using an OpenVZ 2.6.18 based kernel.

      Stefan

        Issue Links

          Activity

          Hide
          openvz@yasukun.org Yasuyuki Nakamura added a comment -

          (In reply to comment #26)
          > No, it's not. This hack disables UP optimizations in jvm if machine is
          > actually SMP.

          thanks to explain.

          Now I tested with description of following ticket.
          http://bugzilla.openvz.org/show_bug.cgi?id=2314
          (this ticket is duplicated with OVZ-5337)

          the Java program shows wrong behavior with 2.6.32-042stab061.2 kernel.
          So it seems the bug is not fixed or I find other problem.
          Do you have any ideas?

          Show
          openvz@yasukun.org Yasuyuki Nakamura added a comment - (In reply to comment #26) > No, it's not. This hack disables UP optimizations in jvm if machine is > actually SMP. thanks to explain. Now I tested with description of following ticket. http://bugzilla.openvz.org/show_bug.cgi?id=2314 (this ticket is duplicated with OVZ-5337 ) the Java program shows wrong behavior with 2.6.32-042stab061.2 kernel. So it seems the bug is not fixed or I find other problem. Do you have any ideas?
          Hide
          khlebnikov@openvz.org Konstantin Khlebnikov added a comment -

          Probably this workaround in sysfs just does not work for you JVM/libc version, or libc cannot get access to /sys for some reason. Currently CPUS=1 does nothng useful except hacking /proc/cpuinfo and it breaks JVM. So, please use CPULIMIT=100 instead.

          Show
          khlebnikov@openvz.org Konstantin Khlebnikov added a comment - Probably this workaround in sysfs just does not work for you JVM/libc version, or libc cannot get access to /sys for some reason. Currently CPUS=1 does nothng useful except hacking /proc/cpuinfo and it breaks JVM. So, please use CPULIMIT=100 instead.
          Hide
          openvz@yasukun.org Yasuyuki Nakamura added a comment -

          (In reply to comment #28)
          > Probably this workaround in sysfs just does not work for you JVM/libc
          > version, or libc cannot get access to /sys for some reason. Currently CPUS=1
          > does nothng useful except hacking /proc/cpuinfo and it breaks JVM. So,
          > please use CPULIMIT=100 instead.

          you mean this bug is not fixed?

          in fact, I compared CPUS=1 and CPUS="unlimited", the VEs' performance with CPUS=1 is better.
          and I aim that the density of hosts is higher and higher.
          so I hope to this bug is fix.
          do you have any plan?

          Show
          openvz@yasukun.org Yasuyuki Nakamura added a comment - (In reply to comment #28) > Probably this workaround in sysfs just does not work for you JVM/libc > version, or libc cannot get access to /sys for some reason. Currently CPUS=1 > does nothng useful except hacking /proc/cpuinfo and it breaks JVM. So, > please use CPULIMIT=100 instead. you mean this bug is not fixed? in fact, I compared CPUS=1 and CPUS="unlimited", the VEs' performance with CPUS=1 is better. and I aim that the density of hosts is higher and higher. so I hope to this bug is fix. do you have any plan?
          Hide
          alexa.gerancho@hushmail.com Alexa added a comment -
          Show
          alexa.gerancho@hushmail.com Alexa added a comment - Bug 260998 has been marked as a duplicate of this bug. *** Seen live from the domain http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.
          Hide
          sergeyb Sergey Bronnikov added a comment -

          Bug was fixed more than one year ago and there were no complains from reporter after fix. We believe bug fix helped and mark bug as closed.

          Show
          sergeyb Sergey Bronnikov added a comment - Bug was fixed more than one year ago and there were no complains from reporter after fix. We believe bug fix helped and mark bug as closed.

            People

            • Assignee:
              khlebnikov@openvz.org Konstantin Khlebnikov
              Reporter:
              s.priebe@profihost.com Stefan Priebe
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: