[Getdp] GetDP MPI versus single CPU

Артем Хорошев vskych at gmail.com
Wed Aug 9 17:09:09 CEST 2017


Hi.

First a couple of tips on improving performance.
1) Generic BLAS slow enough. Try using ATLAS, OpenBLAS or MKL (preferably
for INTEL) or etc.
2) The use of MPI ("mpirun") can have many nuances (for example, multiple
performance degradation in the "Generate" operation with some large
unsymmetrical matrices). Use it with caution or use the OpenMP instead (it
is only possible for the factorization phase). See, for example, OpenBLAS
or ACML or <...> documentation.
3) For symmetric matrices, use the "cholesky" instead of "lu" (also see
MUMPS and PETSc user manual).

Concerning your problem. Preprocessing ("-pre") and postprocessing ("-pos")
use only one thread. Using "mpirun" on these operations leads to a decrease
in performance. Use of MPI ("mpirun") allows to obtain a gain only in the
processing operation ("-cal").
In this case, different combinations of implementations of the BLAS and
hardware (various CPUs) can give a sharp decrease in performance if the
number of threads ("-np N") exceeds Cores / 2.
Try to fulfill this condition, for example "mpirun -np 2 getdp magnet -cal
-cpu". Compare the time between "SaveSolution [A]" and "Solve [A]" ("Wall"
time) with this time on 1 thread ("getdp magnet -cal -cpu").



P.S.
I have no relationship to the development of this software (or libraries
used by it) and I have only little knowledge of higher mathematics. Almost
all the information that I have given is received, mainly empirically and
from the documentation for this software. I hope that my information will
help you.
In my case, I do not use MPI in its pure form. I use OpenBLAS or ACML (for
different hardware) that use OpenMP, which allows several threads to be
used for the factorization phase (I use "mpirun" only in some cases).

And,

> Looking at your mail it seems that I might have to recompile BLAS, MUMPS
> and PETSC with particular options ?

In Ubuntu you can use "sudo apt install openblas*" (for example)  and
recompilation is not required due to "update-alternatives" (but in some
cases it not working). After that you can use "OMP_NUM_THREADS="
environment variable to set number of threads (work only in factorization
phase of "Solve[A]" operation).


2017-08-09 17:00 GMT+03:00 gilles quemener <quemener at lpccaen.in2p3.fr>:

> Hi,
>
> For BLAS, MUMPS and PETSC, I am using standard packages from Ubuntu 16.04
> with the following versions:
> - BLAS : libblas 3.6.0
> - MUMPS: libmumps 4.10.0
> - PETSC: libpetsc 3.6.2
> I do not know the standard compiling options from Ubuntu 16.04 for these
> packages.
>
> For the processors, from /proc/cpuinfo:
> processor    : 0
> vendor_id    : GenuineIntel
> cpu family    : 6
> model        : 63
> model name    : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
> stepping    : 2
> microcode    : 0x2d
> cpu MHz        : 1199.988
> cache size    : 15360 KB
>
>
> Looking at your mail it seems that I might have to recompile BLAS, MUMPS
>> and PETSC with particular options ?
>
> If yes, which versions of these programs (especially for BLAS which can be
> found in many libraries) do you suggest to use ?
>
> Results from running getdp w/ options -cpu -v8 is in attached file
> singleCPU.txt.
>
> Thanks a lot for your help.
>
> Gilles
>
>
> ------------------------------
>
> *De: *"Артем Хорошев" <vskych at gmail.com>
> *À: *"Gilles Quéméner" <quemener at lpccaen.in2p3.fr>
> *Cc: *"getdp" <getdp at onelab.info>
> *Envoyé: *Mercredi 9 Août 2017 15:19:26
> *Objet: *Re: [Getdp] GetDP MPI versus single CPU
>
> What versions of BLAS, MUMPS do you use?
> With what options was PETSc configured?
> Run getdp with "-cpu -v 8" options. If hang that repeat without "-cpu"
> Which processors do you use?
>
> 2017-08-09 10:45 GMT+03:00 gilles quemener <quemener at lpccaen.in2p3.fr>:
>
>> Hi,
>>
>> I have compiled GetDP with MPI option under Linux Ubuntu 16.04 on a
>> machine w/ 6x2 CPUs and 32 MB of RAM.
>> When running the magnet.pro file given in the GetDP demos folder, I was
>> expecting to get faster results with MPI
>> than w/o and was quite surprise by the comparison as shown by the
>> following outputs :
>>
>> 1) MPI run:
>> ***********
>>
>> mpirun -np 12 /home/quemener/local/OneLab_gq/GetDP/bin/getdp magnet
>> -solve MagSta_phi -pos MagSta_phi
>> Info    : Running '/home/quemener/local/OneLab_gq/GetDP/bin/getdp magnet
>> -solve MagSta_phi -pos MagSta_phi' [GetDP 2.11.1, 12 nodes]
>> Info    : Started (Wed Aug  9 09:20:20 2017, Wall = 0.277839s, CPU =
>> 0.884s [0.04s,0.116s], Mem = 287.254Mb [23.4609Mb,27.6289Mb])
>> Info    : Increasing process stack size (8192 kB < 16 MB)
>> Info    : Loading problem definition 'magnet.pro'
>> Info    : Loading problem definition 'magnet_data.pro'
>> Info    : Loading problem definition '../templates/MaterialDatabase.pro'
>> Info    : Loading problem definition '../templates/MaterialMacros.pro'
>> Info    : Loading problem definition '../templates/Magnetostatics.pro'
>> Info    : Selected Resolution 'MagSta_phi'
>> Info    : Loading Geometric data 'magnet.msh'
>> Info    : System 'A' : Real
>> P r e - P r o c e s s i n g . . .
>> Info    : Treatment Formulation 'MagSta_phi'
>> Info    :   Generate ExtendedGroup '_CO_Entity_15' (NodesOf)
>> Info    : [rank   8] System 1/1: 1658225 Dofs
>> Info    : [rank   7] System 1/1: 1658225 Dofs
>> Info    : [rank   3] System 1/1: 1658225 Dofs
>> Info    : [rank   4] System 1/1: 1658225 Dofs
>> Info    : [rank   1] System 1/1: 1658225 Dofs
>> Info    : [rank   2] System 1/1: 1658225 Dofs
>> Info    : [rank  10] System 1/1: 1658225 Dofs
>> Info    : [rank   6] System 1/1: 1658225 Dofs
>> Info    : [rank   5] System 1/1: 1658225 Dofs
>> Info    : [rank  11] System 1/1: 1658225 Dofs
>> Info    : [rank   0] System 1/1: 1658225 Dofs
>> Info    : [rank   9] System 1/1: 1658225 Dofs
>> Info    : (Wall = 30.3178s, CPU = 181.936s [14.804s,16.964s], Mem =
>> 8228.93Mb [685.133Mb,689.43Mb])
>> E n d   P r e - P r o c e s s i n g
>> P r o c e s s i n g . . .
>> Info    : CreateDir[../templates/res/]
>> Info    : Generate[A]
>> Info    : Solve[A]
>> Info    : N: 1658225 - preonly lu mumps
>> Info    : SaveSolution[A]
>> Info    : (Wall = 87.9995s, CPU = 489.904s [38.456s,48.752s], Mem =
>> 14249.9Mb [1109.46Mb,1297.96Mb])
>> E n d   P r o c e s s i n g
>> P o s t - P r o c e s s i n g . . .
>> Info    : NameOfSystem not set in PostProcessing: selected 'A'
>> Info    : Selected PostProcessing 'MagSta_phi'
>> Info    : Selected Mesh 'magnet.msh'
>> Info    : PostOperation 1/4
>>           > 'res/MagSta_phi_hc.pos'
>> Info    : PostOperation 2/4
>>
>>           > 'res/MagSta_phi_phi.pos'
>> Info    : PostOperation 3/4
>>
>>           > 'res/MagSta_phi_h.pos'
>> Info    : PostOperation 4/4
>>
>>           > 'res/MagSta_phi_b.pos'
>> Info    : (Wall = 377.562s, CPU = 1118.17s [80.556s,207.128s], Mem =
>> 14254.1Mb [1109.95Mb,1297.96Mb])
>> E n d   P o s t - P r o c e s s i n g
>> Info    : Stopped (Wed Aug  9 09:26:37 2017, Wall = 377.851s, CPU =
>> 2012.12s [162.712s,207.26s], Mem = 14254.1Mb [1109.95Mb,1297.96Mb])
>>
>>
>> 2) Single CPU run:
>> ******************
>>
>> /home/quemener/local/OneLab_gq/GetDP/bin/getdp magnet -solve MagSta_phi
>> -pos MagSta_phi
>> Info    : Running '/home/quemener/local/OneLab_gq/GetDP/bin/getdp magnet
>> -solve MagSta_phi -pos MagSta_phi' [GetDP 2.11.1, 1 node]
>> Info    : Started (Wed Aug  9 09:27:06 2017, Wall = 0.146171s, CPU =
>> 0.136s, Mem = 25.9609Mb)
>> Info    : Increasing process stack size (8192 kB < 16 MB)
>> Info    : Loading problem definition 'magnet.pro'
>> Info    : Loading problem definition 'magnet_data.pro'
>> Info    : Loading problem definition '../templates/MaterialDatabase.pro'
>> Info    : Loading problem definition '../templates/MaterialMacros.pro'
>> Info    : Loading problem definition '../templates/Magnetostatics.pro'
>> Info    : Selected Resolution 'MagSta_phi'
>> Info    : Loading Geometric data 'magnet.msh'
>> Info    : System 'A' : Real
>> P r e - P r o c e s s i n g . . .
>> Info    : Treatment Formulation 'MagSta_phi'
>> Info    :   Generate ExtendedGroup '_CO_Entity_15' (NodesOf)
>> Info    : System 1/1: 1658225 Dofs
>> Info    : (Wall = 10.7856s, CPU = 7.42s, Mem = 688.309Mb)
>> E n d   P r e - P r o c e s s i n g
>> P r o c e s s i n g . . .
>> Info    : CreateDir[../templates/res/]
>> Info    : Generate[A]
>> Info    : Solve[A]
>>
>> Info    : N: 1658225 - preonly lu mumps
>> Info    : SaveSolution[A]
>> Info    : (Wall = 40.7256s, CPU = 47.028s, Mem = 4496.32Mb)
>> E n d   P r o c e s s i n g
>> P o s t - P r o c e s s i n g . . .
>> Info    : NameOfSystem not set in PostProcessing: selected 'A'
>> Info    : Selected PostProcessing 'MagSta_phi'
>> Info    : Selected Mesh 'magnet.msh'
>> Info    : PostOperation 1/4
>>           > 'res/MagSta_phi_hc.pos'
>> Info    : PostOperation 2/4
>>
>>           > 'res/MagSta_phi_phi.pos'
>> Info    : PostOperation 3/4
>>
>>           > 'res/MagSta_phi_h.pos'
>> Info    : PostOperation 4/4
>>
>>           > 'res/MagSta_phi_b.pos'
>> Info    : (Wall = 324.758s, CPU = 132.924s, Mem =
>> 4496.32Mb)
>> E n d   P o s t - P r o c e s s i n g
>> Info    : Stopped (Wed Aug  9 09:32:31 2017, Wall = 324.946s, CPU =
>> 133.032s, Mem = 4496.32Mb)
>>
>> Does any one has clues to explain such a behaviour ?
>>
>>        Gilles
>>
>>
>> _______________________________________________
>> getdp mailing list
>> getdp at onelab.info
>> http://onelab.info/mailman/listinfo/getdp
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://onelab.info/pipermail/getdp/attachments/20170809/e6d8eaab/attachment-0001.html>


More information about the getdp mailing list