<div dir="auto">Even as a start, semantics metadata to image or video would be a start.   I’m not sure what’s possible with Graph Convolution Networks (GCNs).</div><div dir="auto"><br></div><div dir="auto">John</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Oct 1, 2022 at 9:17 PM John Carlson <<a href="mailto:yottzumm@gmail.com">yottzumm@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)"><div dir="auto">I realize image-to-3D and video-to-3D may be possible already.</div><div dir="auto"><br></div><div dir="auto">John</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Oct 1, 2022 at 9:05 PM John Carlson <<a href="mailto:yottzumm@gmail.com" target="_blank">yottzumm@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">Now with prompt engineering, we can do text-to-image and text-to-video.   Now, how about text(description)-to-mesh, text(description)-to-shape, text(description)-to-scene, or even text(description)-to-world?<div dir="auto"><br></div><div dir="auto">Can semantics help this?</div><div dir="auto"><br></div><div dir="auto">John</div>

</blockquote></div></div>

</blockquote></div></div>