[Gmsh] "Re: GMSH parsing is very slow"

Thu Jan 31 17:01:16 CET 2019

> On 31 Jan 2019, at 15:48, Al Sc <al.sc.gmsh at gmail.com> wrote:
> 
> Dear Christoph,
> 
> to be honest, a remark deeply hidden somewhere in the documentation wouldn't have helped me.
> That's because my approach typically is not to read the documentation like a book (i.e. from front to back) but rather to skip forwards to the paragraphs I currently need for the implementation.
> I believe most users work in a similar way.

Looking at the doc the explanation was indeed already there... deeply hidden in the FAQ: http://gmsh.info/dev/doc/texinfo/gmsh.html#Geometry-module-questions ;-)

I've patched the documentation so that the information is now right there in the definition of each command. See e.g. http://gmsh.info/dev/doc/texinfo/gmsh.html#Miscellaneous-mesh-commands

> My personal suggestion would be that you "detect" in your backwards-compatible gmsh parser the event where -something-like- more than 1000 CAD-to-topology synchronizations (or whatever you call it) are implicitly done. And if this happens, just _FAIL_. This slightly breaks your compatibility, but only in corner cases where performance is too bad for practical purposes anyway. A user may want to override this behavior using a special command line option like "-cad-sync-limit infinity" - in case he thinks otherwise.
> 

The future is the API!

Christophe

> Best regards
> A. S.
> 
> Am Do., 31. Jan. 2019 um 15:39 Uhr schrieb Christophe Geuzaine <cgeuzaine at uliege.be>:
> 
> 
> > On 31 Jan 2019, at 15:19, Al Sc <al.sc.gmsh at gmail.com> wrote:
> > 
> > Dear Christophe,
> > 
> > I tried out your gmsh-file with "Point{100+1:100+N} In Volume{1};" and it's indeed much faster!
> > That indeed seems to be the solution I needed -- thanks a lot!
> > 
> > As I don't really know much about the internal functionality of gmsh, it would have been almost impossible for me to come up with that solution on my own.
> > From your documentation it is clear that the input to gmsh is more than just a geometry specification, and it's rather kind of a script which generates the geometry.
> > However, given only the documentation, a naive user like me will conclude that you do not *need* any advanced scripting functionality to generate a basic model. E.g. that you can restrict yourself to "just using the geometry-specification subset of the language" without any performance penalty. Hence, a naive user concludes that:
> > " Point{100+1:100+N} In Volume{1};"
> > and
> > " Point{100} In Volume{1}; ...
> >  ... Point{10100} In Volume{1};"
> > are, in fact, equivalent (not only in semantics, but also in "parser performance").
> 
> They are equivalent. The difference is *when* you call them.
> 
> > 
> > I'd suggest that you try to implement the model parsing in a way that the "topology structure" is not immediately re-computed, but only invalidated.
> > Because, in my opinion, the observed performance-drop of gmsh with repeated "Point In Volume" statements is still highly counter-intuitive, even to users more advanced than me.
> > 
> 
> Indeed. This is all made clear when you use the API, where the distinction between CAD operations and topological model operations is made explicit. 
> 
> For example, if using the Python API you try to do
> 
> for i in range(N):
>   gmsh.model.occ.addPoint(...)
>   gmsh.model.mesh.embed(...)
> 
> you will get an error, stating that the point does not exist in the model - since no call to gmsh.model.occ.synchronize() has been made after the point was added to the CAD. To make it work you would then do
> 
> for i in range(N):
>   gmsh.model.occ.addPoint(...)
>   gmsh.model.occ.synchronize(...)
>   gmsh.model.mesh.embed(...)
> 
> When N is large you would rapidly conclude that calling "synchronize" that many times will kill your performance (the API doc makes this explicit).
> 
> The .geo parser strives to be automatic and backward compatible with very old versions of Gmsh. So each "Point ... In ..." command thus checks if the CAD has been modified - and if it has, triggers a synchronization automatically. We could add a note in the documentation about this, by stating explicitly which operations will trigger a sync. I'm adding this to our TODO list.
> 
> Christophe
> 
> 
> 
> 
> > Best regards,
> > A.S.
> > 
> > Am Do., 31. Jan. 2019 um 14:53 Uhr schrieb Christophe Geuzaine <cgeuzaine at uliege.be>:
> > 
> > This was precisely the point of my example: if you embed the point after each point is created, you force a rebuild of the topology of the model. So the efficient script would be
> > 
> > SetFactory("OpenCASCADE");
> > Box(1) = {0,0,0, 1,1,1};
> > N=10000;
> > For i In {1:N}
> >   Point(100+i) = {0.25 + 5e-5*i, 0.1,0.1};
> > EndFor
> > Point{100+1:100+N} In Volume{1};
> > 
> > Christophe
> > 
> > 
> > > On 31 Jan 2019, at 14:48, Al Sc <al.sc.gmsh at gmail.com> wrote:
> > > 
> > > Dear Christophe,
> > > 
> > > my example files are scientific data, however originate from processing certain proprietary 3D models that were shared with me under certain restrictions. Therefore, it's difficult to share my original file with you.
> > > However, I was indeed able to reproduce the issue using only a slight modification of your example file!
> > > 
> > > Compare  the following two files:
> > > test1.geo:
> > > %%% %%% %%% %%% %%% %%% %%% %%% %%% %%%
> > > SetFactory("OpenCASCADE");
> > > Box(1) = {0,0,0, 1,1,1};
> > > For i In {1:1000}
> > >   Point(100+i) = {0.25 + 1e-4*i, 0.1,0.1};
> > >   Point{100+i} In Volume{1};
> > > EndFor
> > > %%% %%% %%% %%% %%% %%% %%% %%% %%% %%% 
> > > 
> > > test2.geo:
> > > %%% %%% %%% %%% %%% %%% %%% %%% %%% %%% 
> > > SetFactory("OpenCASCADE");
> > > Box(1) = {0,0,0, 1,1,1};
> > > For i In {1:10000}
> > >   Point(100+i) = {0.25 + 1e-5*i, 0.1,0.1};
> > >   Point{100+i} In Volume{1};
> > > EndFor
> > > %%% %%% %%% %%% %%% %%% %%% %%% %%% %%% 
> > > 
> > > The first file contains 1000 internal pts, and the second one contains 10 times as much.
> > > Now run:
> > >   $ time gmsh -1 test1.geo
> > >     -> Takes 0.88 sec
> > >   $ time gmsh -1 test2.geo 
> > >     -> Takes 75 sec 
> > > This is a nearly 100 (!!!) times increase in parsing run time, when the model size is increased by only a factor of 10.
> > > This nicely illustrates what I meant by "quadratic growth of the parsing run time as a function of the number of internal pts."
> > > As a consequence of this "quadratic growth", the run time for larger models quickly becomes enormous.
> > > 
> > > Best regards
> > > 
> > > A.S.
> > > 
> > > Am Do., 31. Jan. 2019 um 14:31 Uhr schrieb Christophe Geuzaine <cgeuzaine at uliege.be>:
> > > 
> > > Can you send a test file?
> > > 
> > > I tried this, i.e. adding 1000 embedded points in a volume, and it is quite fast (< 2 seconds):
> > > 
> > > SetFactory("OpenCASCADE");
> > > Box(1) = {0,0,0, 1,1,1};
> > > For i In {1:1000}
> > >   Point(100+i) = {0.25 + 5e-4*i, 0.1,0.1};
> > >   Point{100+i} In Volume{1};
> > > EndFor
> > > 
> > > Maybe you modify the model after each insertion of an embedded point, which would force a rebuild of the topological model?
> > > 
> > > Christophe
> > > 
> > > 
> > > > On 31 Jan 2019, at 13:04, Al Sc <al.sc.gmsh at gmail.com> wrote:
> > > > 
> > > > Dear Sir or Madam,
> > > > 
> > > > some more details on the previously-reported case of extremely slow parsing of a gmsh file:
> > > > The file contains 34795 points -- of which 23041 points are "Point In Volume" (e.g. embedded_vertices of a GRegion?). Moreover, it contains 4998 "Plane Surface"s and one single "Volume". Almost all plane surfaces are triangles and quads -- except for <10 facets, which each have a very high vertex count (typ. 7000).
> > > > 
> > > > I found out that if I remove all "Point In Volume" objects, then the parsing takes only 20 seconds. However, if all the "Point In Volume" objects are not removed, parsing takes around 3000 seconds. This led me to the hypothesis that there is some huge inefficiency with parsing gmsh files with "Point In Volume" (probably quadratic run time in N(PointInVolume)??).
> > > > 
> > > > Furthermore, I analyzed the runs using "perf" (linux utility).
> > > > Run 1 (about 3000 seconds):
> > > >   $ time perf record ./gmsh-3.0.6-Linux64/bin/gmsh -1 input_orig.geo
> > > > Run 2 (about 20 seconds):
> > > >   $ time perf record ./gmsh-3.0.6-Linux64/bin/gmsh -1 input_no_point_in_volume.geo
> > > > In Run 1, "perf report" yields the following top-scoring functions:
> > > >   11.57%  gmsh     gmsh                 [.] GModel::getEdgeByTag
> > > >    7.73%  gmsh     gmsh                 [.] GEO_Internals::synchronize
> > > >    6.80%  gmsh     libc-2.12.so         [.] _int_free
> > > >    6.31%  gmsh     gmsh                 [.] GModel::getVertexByTag
> > > >    4.91%  gmsh     libc-2.12.so         [.] malloc
> > > >    4.53%  gmsh     gmsh                 [.] InterpolateCurve
> > > >    4.40%  gmsh     libstdc++.so.6.0.22  [.] std::_Rb_tree_increment
> > > >    3.88%  gmsh     libc-2.12.so         [.] memcpy
> > > >    3.69%  gmsh     gmsh                 [.] List_Nbr
> > > >    3.35%  gmsh     libc-2.12.so         [.] _int_malloc
> > > >    3.24%  gmsh     gmsh                 [.] GEntity::GEntity
> > > >    3.13%  gmsh     gmsh                 [.] avl_lookup
> > > >    2.63%  gmsh     gmsh                 [.] gmshFace::resetNativePtr
> > > >    2.53%  gmsh     gmsh                 [.] GEdge::addFace
> > > >    2.52%  gmsh     gmsh                 [.] CompareVertex
> > > >    1.97%  gmsh     gmsh                 [.] std::_List_base<GEdge*, std::allocator<GEdge*>
> > > >    1.88%  gmsh     gmsh                 [.] GModel::getFaceByTag
> > > >    1.76%  gmsh     gmsh                 [.] Tree_Action
> > > >    1.74%  gmsh     gmsh                 [.] gmshEdge::degenerate
> > > >    1.63%  gmsh     gmsh                 [.] CompareCurve
> > > >    1.53%  gmsh     gmsh                 [.] std::_List_base<int, std::allocator<int> >::_M
> > > >    1.38%  gmsh     gmsh                 [.] GModel::deletePhysicalGroups
> > > >    1.23%  gmsh     gmsh                 [.] std::_List_base<GEdgeSigned, std::allocator<GE
> > > >    1.10%  gmsh     gmsh                 [.] GVertex::addEdge
> > > >    0.85%  gmsh     gmsh                 [.] List_Read
> > > >    0.75%  gmsh     gmsh                 [.] GEdge::getBeginVertex
> > > >    0.73%  gmsh     gmsh                 [.] GFace::computeMeanPlane
> > > >    0.73%  gmsh     libc-2.12.so         [.] free
> > > >    0.67%  gmsh     gmsh                 [.] CTX::instance
> > > >    0.65%  gmsh     gmsh                 [.] gmshEdge::resetMeshAttributes
> > > > 
> > > > This suggests that maybe GModel::getEdgeByTag is eventually called a quadratic number of times in the number of PointInVolume-objects -- and this causes the drastic slow-down.
> > > > 
> > > > Could you please investigate this? Thanks a lot!
> > > > (As already mentioned, this issue also occurs with gmsh-4.x.x)
> > > > 
> > > > Best regards
> > > > A. S.
> > > > _______________________________________________
> > > > gmsh mailing list
> > > > gmsh at onelab.info
> > > > http://onelab.info/mailman/listinfo/gmsh
> > > 
> > > — 
> > > Prof. Christophe Geuzaine
> > > University of Liege, Electrical Engineering and Computer Science 
> > > http://www.montefiore.ulg.ac.be/~geuzaine
> > > 
> > > 
> > > 
> > 
> > — 
> > Prof. Christophe Geuzaine
> > University of Liege, Electrical Engineering and Computer Science 
> > http://www.montefiore.ulg.ac.be/~geuzaine
> > 
> > 
> > 
> 
> — 
> Prof. Christophe Geuzaine
> University of Liege, Electrical Engineering and Computer Science 
> http://www.montefiore.ulg.ac.be/~geuzaine
> 
> 
> 

— 
Prof. Christophe Geuzaine
University of Liege, Electrical Engineering and Computer Science 
http://www.montefiore.ulg.ac.be/~geuzaine