There are strong cases for both formats:

- Both open standards, lots of software to view/convert etc.

- glTF is more efficient to deliver to GPU.

  X3D (non-binary encodings) have to be parsed, which means your bottleneck is StrToFloat().

  X3D binary isn't very popular (not so many software to view/convert) and it's not perfect to deliver to GPU anyway. X3D binary removes the parsing problems (all those StrToFloat), but it doesn't add many ways to express a mesh data (per-vertex values, with interleaving etc.). (? Probably? Not knowing X3D binary, but probably no interleaving?)

- glTF is more modern (PBR).

- glTF seems to have more exporter support.

  Blender:
  - glTF 2.0 exporter is developed by Khronos,
  - there is a version for <= 2.79 and for >= 2.80,
  - it's even included in Blender by default,
  - it supports animations, normalmaps.

  In contrast:
  - X3D exporter is only for <= 2.79, until someone fixes it,
  - it was included in Blender by default, but not in 2.80,
  - it doesn't support animations or normalmaps (CGE exporter adds normalmaps, and can export to own animation format).

- glTF doesn't offer any interaction.

  No sensors (touch sensor, proximity sensor),
  no scripting.

- glTF doesn't offer some graphic features.

  No cubemap textures,
  no 3d textures,
  no texture coordinate generation modes,
  less ways to mix textures (although MultiTexturing in X3D comes with a large bad of problems, but they can be overcome by fixing various spec points, see CGE docs).

  No lights (image-based on not),
  at least in standard.
  There is extension for "punctual" (i.e. not image-based) lights:
  https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_lights_punctual
  and (less stable, cause only EXT) image-based lights:
  https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Vendor/EXT_lights_image_based

  No fog.
  No background.

- glTF doesn't offer some other features, I mention below some important ones:

  No easy Text writing
  (leaving conversion Text->mesh to 3D exporters is *not* enough, it doesn't allow to dynamically modify the text during game.)

  No sound.

  No NURBS.

  No shaders. (although X3D shaders need extending, to be composable -- see CGE extension.)
  Later note: Actually it has extension addressing this:
  https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_techniques_webgl

  No composition of many models using Inline?

  Although (in defense of glTF) most X3D exporters don't use these features either.
  E.g. Blender text is exported as a mesh to X3D,
  Blender sound objects are not exported to X3D,
  Blender NURBS are exported as a mesh to X3D.
  Merely having these concepts in X3D standard is not a magic wand to make them used.

  But at least APIs exist to use NURBS, sound etc.
  E.g. CGE Spine conversion converts Spine paths -> X3D NURBS.

- glTF doesn't offer scene graph?
  This is important as an engine API to build and modify what is rendered
  using X3D nodes.
  Although one can argue that applying glTF concepts in a straightforward way
  also allows you to define a reasonable API to operate on a scene.