Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Communicating Threads and Processes in
Java -an approach to reliable parallel programming
Manuel I. Capel-Tuñón
Overview
–
Although not initially designed
as a High Performance
Parallel Programming
language, Java is an attractive
candidate for it.
–
Would like to review a
selection of programming
models and systems to bring
University of Granada, Spain.
mcapel@ugr.es
HP Computing to Java.
Motivation
Web-based
global computing:
–High demand of geographically distributed resources
–Useful for “strategic” applications: financial modeling,
computational genetics, weather forecasting, etc.
–Now taking advantage of spare cycles in computers
across Internet is technologically feasible
Programming
features will also make Java the language
of choice for numerical computing
Talk outline
Overview
Motivation
The Java programming model
Multithread programming in Java
Data Parallelism in Java
The “Grid” High Performance Computing
Numerical Computing in Java
Conclusion
The Java Parallel Programming Model
Fits better under control parallelism
Threads are created and managed explicitly as objects
Weak consistency memory model (memory model
easily mapped onto shared-memory systems)
Creation methods for groups of threads not provided
Remote Method Invocation, for distributed systems
Java’s software architecture
Local disk
X.class
Y.class
Native
javac
libraries
JVM
source
source
thread
thread
worki
ng
files
Network
stack memo0101
ry 1011
assign
thread
worki
1000
ng 0101
1101
stack use
1011
memo
ry
assign
1000
0101
use 1011
1101
1000
1101
A.class
B.class
services GUI
Operating system
hardware
files
Main memory
cache
central
stack
Garbage collector
Heap
Models for running Java byte-code
Hardware
Native code compiler
Just-In-Time (JIT) compiler
java interpreter
Java’s software architecture
Local disk
Native
X.class
Y.class
libraries
javac
verifier
source
JNI
files
Class
Main memory
loader
cache
Network
JIT
native executable
A.class
B.class
services GUI
Operating system
hardware
central
stack
Garbage collector
Heap
Java’s memory model
JVM
main
memory
01010101
1011 1011
10001000
thread
1101 1101
working
01010101
stack
memory store 1011 1011
thread
assign 0101
10001000
working
load 1101 1101
1011
stack
memory
thread
use
assign 01011000
working 1101
central
stack
1011
memory
results stack
assignuse 01011000
1101
1011
use
1000
operands
1101
boundary
weak
consistency:shared
variables updates are only
made visible to other
threads in synchronised
code blocks
uses
a computation model
based on a (global) stack
Efficient
Java memory model
code interpretation
but not execution on
register based processors
Taking advantage of registers
int A, B, C, j;
A = 4;
B = 8;
for (j=0; j< 10; j++)
{
C= A + B;
... // A and B are modified
in the body of the loop
}
A register
based
architecture will load the
variables once for the
entire loop
JVM pushes A, B on the
stack each time C is
computed!
A byte-code to native code
compiler will eliminate the
execution overhead
Talk outline
Overview
Motivation
The Java programming model
Multithread programming in Java
Data Parallelism in Java
The “Grid” High Performance Computing
Numerical Computing in Java
Conclusion
Multithread programming in Java
Java contains a series of
constructs and specific classes
for concurrent programming:
– java.lang.Thread
– java.lang.Object:
» wait()
» notify()
» notifyAll()
– synchronized, volatile
Object
Runnable
Multithreading is a basic
feature for the implementation
of good applications in Java
Thread
Thread states and scheduling
new
runnable
I/O ends
end sleep
notify( )
notifyAll( )
join( ) ends
suspend( )
yield( )
running
suspend( )
suspended
I/O begins,
sleep( ),
wait( ),
join( )
blocked
Suspended
-blocked
There is a set of methods
which produce a change in
the state of a given thread:
–
–
–
–
–
–
–
–
–
–
–
new
wait()
notify()
notifyAll()
start()
yield()
suspend()
resume()
sleep(time)
join()
stop()
The ThreadGroup Class
public void
Barrier(Semaphore barrier){
to create or start all thread
barrier.counter -=1;
members simultaneously
if barrier.counter >0
barrier.wait();
else {
barrier.notifyAll();
}
}
Collective communication must
also be explicitly programmed
barrier.counter=
num_threads;
Java does not provide methods
by adding mutual exclusion and
auxiliary variables, a low-level
programming style results!
Synchronization between
asynchronous threads
Threads
Synchronized
method
ready queue
Service methods
public synchronized
void deposit(double v){
while(count == slots)
try{wait();} catch(...){}
buffer[pIn]= v;
... count++;
if (count == 1) notify();
}
deposit(...)
notify
fetch()
releases
object’s lock
executes
synchronized
block
synchronized methods give a
simple way of exclusive access and
avoid race hazards between threads
Java
threads synchronization is
loosely based on monitor construct
with signal-and-continue semantics
Synchronization
in the access to
shared variables is problematic since
the semantics of notify() is error prone
anonymous condition
queue
acquires
object’s lock
The
public synchronized
double fetch(){
while(count == 0)
try{wait();} catch(...){}
buffer[pIn]= v;
... count--;
if (count==slots-1) notify();
}
The
logic within the monitor methods
involved in wait-notify pairs has to be
tightly coupled
Monitors,
as passive entities, cannot
prevent their methods being called
Producer-consumer data sharing
class Barber_shop{
.... //monitor variables
public synchronized get_haircut() //called by customers{
while(barber==0) wait();
barber--;chair++;notify();
...
}
public get_next_customer() //called by the barber{
barber++; notify();
while (chair == 0)wait();
chair--;
}
...
} // end classs
Semantic of notifications
I'm notifying:
next client can
be served !
Correct producer-consumer monitor
Monitor Barber_shop{
.... //monitor variables
procedure get_haircut; //called by customers{
if(barber==0) available.wait;
barber--;
chair++;
if (NotOcupied.queue)NotOcupied.signal
}
procedure get_next_customer; //called by the barber{
barber++;
if (available.queue) available.signal;
if (chair==0)NotOcupied.wait();
chair--;
}
} // end Monitor
Semantic of signals
I'm signaling next
awaiting client !
Talk outline
Overview
Motivation
The Java programming model
Multithread programming in Java
Data Parallelism in Java
The “Grid” High Performance Computing
Numerical Computing in Java
Conclusion
Data Parallelism in Java
Simultaneous operations on disjoint partitions of
data by multiple processors
Data give the parallel dimension of programs in
this model
Data Parallelism in Java
a11 a12 a13 ... a1n
b11 b12 b13 ... b1k
c11 c12 c13 ... c1n
a21 a22 a23 ... a2n
b21 b22 b23 ... b2k
c21 c22 c23 ... c2n
...
...
...
am1 am2 am3... amn
bm1 bm2 bm3... bmk
ck1 ck2 ck3... ckn
Data dependency
Data to Processor
mapping
Cluster of processors
Data Parallelism in Java
Homogeneous parallelism, lightweight processes,
regularly spawned, ordered event sequences
Idiomatic features:
– Producer-consumer operation for synchronization and
data communication between threads
– Creating and starting multiple threads simultaneously
– Collective communication/synchronization between sets
of threads
Java makes expressing data parallelism awkward
SPMD Java Library Interface
TSP
Köln
D7
Koblenz
D2
D1
D3
D6
Dortmund
Localization
D4
Kasel
D5
Frankfurt
Global work queue
BS
of shared data-structures
as objects of a SPMD-Java library
Now
a explicit communication between
processes is needed
Best solution
Communication
found
links among threads
will be encapsulated in shared objects
Elements
of dynamic objects may be
migrated during program execution
Thread
groups that share an object
define their connection topology
The proposed solution
Master
Assign
subproblem
BS
Update
bs
Load
bs
bs
bs
bs
slave1
slave2
slave3
Distributed-BBapplication
construction
for the TSP
Processor limits
bs Best solution DOF
Work queue DOF
Specification of a logical topology for a
given distribution of global data, so that
access from remote servers to locally
assigned data is made easier
Methodological software construction
based on distributed active objects
(DOFs) which encapsulate global data
structures and topologies
Processes and DOFs are distributed to
the servers by the programmer
according to each parallel distributed
application
The approach is aimed at hiding low
level communication, global data
distribution transparen-cy and access
locality to any object in the program
The programming environment
Network communications between
the APs are hidden behind the
interface of a class of objects.
A class of active objects -called
DOFs- establish communication
links between them according to a
logical connection topology
Distributed
Object
Fragment
objectProxy
Input
Handler
Application
interface
externalComm(...)
send(...)
Output
Handler
Application
process
Virtual methods, such as send(..),
externalComm(..) and others,
must be explicitly programmed.
proxies encapsulate data and
provide communication facilities to
the application processes.
Run time + JVM
The socket library
Transport layer
Communication layer
Talk outline
Overview
Motivation
The Java programming model
Multithread programming in Java
Data Parallelism in Java
The “Grid” High Performance Computing
Numerical Computing in Java
Conclusion
A step ahead: the “Grid” computing
Here
Spawn This
Linux
SGI
NT
Solaris
Cluster
Cluster
Collect spare cycles, thus
computational power will be enhanced!
Idle processor
machine
Data repository
Wide-scale HP programming
paradigm
Concurrent
Processes Processes
User
Data
0101
1011
1000
0101
1011
1000
Data
User
Computational
Resources
Computational
Resources
Data Processes
0101
1011
1000
User
Computational
Resources
Computing
–merging and splitting of
multiple virtual machines
–platform and performance
portability, safety, reusability
–multilanguage support to
include Java, C, C++, and
Fortran.
–message passing supported
paradigm for parallel and
distributed computing
Remote Method Invocation
Server program
Client program
results
Remote method
invocation
Remote
procedure
RMI
supports polymorphism
at object’s method invocation
call
Java
passes objects by
reference
Sun’s RMI protocol
+
serialisation
references
Parameters
unpacked
as parameters of
an RMI call have to be
passed in network-wide
representation
Parameters
packed
results
Transport layer
message
The Internet
Current RMI implementation
rules out transparent remote
invocation of methods
Programming with Java’s RMI
The Interface
The Server side
public interface Hello extends Remote{
public class HelloImpl extends UnicastRemoteObject
public String sayHello() throws java.rmi.RemoteException;
implements Hello{ //previously declared interface
}
The Client side
public HelloImpl() throws RemoteException{
super();
}
public String sayHello() throws RemoteException{
public static void main(String args [ ]) {
returns ''Hello World!'';
System.setSecurityManager(new RMISecurityManager( ));
}
try{
public static void main(String args[]){
Hello
h = (Hello) Naming.lookup(''rmi://ockham.ugr.es/hello''); try{
HelloImpl h = HelloImpl();
String message = h.sayHello( );
Naming.rebind(''hello'', h)
System.out.println(''HelloClient: '' + message);
}
}
catch(RemoteException re){
catch(RemoteException re){
...
...
}
}
}
}
}
Implementation of Java’s RMI
There is no access
transparency at all in
Java’s RMI
The large difference in
performance between
the java interpreter and
JIT compilers indicates
that RMI involves an
amount of inefficient
Hello world!
HelloClient
Registry
HelloImpl_Stub.class
Stub
HelloImpl_Skel.class
“Hello World!”
HelloImpl_Impl.class
Java code
Programming with sockets
The socket version of a program is faster than the RMI one, but
results in increased program size and thus reduces productivity
and maintainability
The implementation of programs becomes more difficult and
inefficient:
– The programmer has to write a communication protocol
– Communication is handled by the operating system
Neither the socket nor the RMI version can take advantage of
locality in distributed applications
Getting Explicit Parallelism
Optimized Standard JVM
Implementations:
Substituting
RMI and sockets by
more efficient ones but
preserving JVM byte-codes
Advantages:
–truly object oriented, supports
all data types of Java
programming, it is garbage
collected
Drawbacks:
–RMI is too slow for HPP
programming
Native code JVM
Implementations:
Through
a common Message
Passing Java (MPJ) API
Advantages:
–full performance of native
code
–extensive code optimizations
–a basis for conversion
between C, C++, Fortran and
Java
Drawbacks:
–Programming features of
Java could result
compromised
JavaParty: an “optimized
implementation” of RMI
Run Time Manager
migration
create class object
Local JP
Local JP
reset
call for
current state
runtime environment
runtime environment
access distributed
environment
JavaParty software architecture
Objects can migrate between
nodes transparently to the
programmer
JP extends Java’s RMI with a
pre-processor and a runtime
The runtime system is used to
access the static entities of
each class
For each class that is loaded
dynamically a single object is
created remotely
JP improves locality and
reduces communication time
Manta system: an efficient
implementation of RMI
The objective is to push the runtime overhead to compile
time while supporting polymorphic remote method
invocation and allowing interoperability with other JVMs.
By using a native Java compiler it is possible for the
performance of RMI to equal that of other parallel
languages
Manta supports dynamic class loading by compiling
methods and creating serializations at run time
To support interoperability with other JVMs, Manta has a
byte-code-to-native compiler startable at run time
Manta/JVM interoperability
Manta process
generated serializers
Manta process
Manta RMI
generated serializers
protocol
application
JVM
Sun RMI
application
Sun RMI protocol
+
generic serialisation
HTTP server
JVM
protocol
BYTECODE
serializer generation
Class files
bytecode compiler
HTTP server
Class files
bytecode loader
BYTECODE
Talk outline
Overview
Motivation
The Java programming model
Multithread programming in Java
Data Parallelism in Java
The “Grid” High Performance Computing
Numerical Computing in Java
Conclusion
Numerical Computing in Java
Complex x = new Complex(5,2);
Complex y= new Complex(2,-3);
Complex z= a.times(x).plus(y);
X
Y
0
0
1
1
2
2
3
3
4
4
Wide-scale
adoption of
Java as a language for
numerical computing
Current difficulties to
overcome:
Inefficient support for
complex numbers
Lack of multidimensional
arrays
overrestrictive floating-point
semantics
Computational Performance
Lack of IEEE floating point standard.
Things like extended double hardware is
inaccessible to Java programs (such as
the IEEE 754 double extended precision,
or long double)! Computational
reproducibility at the cost of speed,
precision.
Numerical Libraries in Java
Java programmed libraries
Native code libraries
Advantages:
– porting easiness
through JNI, legacy
code reuse,
– adherence to standards
(MPI, LAPACK, etc.)
Compromised:
– Robustness
– Reproducibility
– Portability
– Performance
Freely
distributed
Strong
dependency on future
Java specifications, JIT
optimizations, etc.
Promising results for
currently ported libraries:
– JNL (Visual Numerics)
– JAMAL (MathWorks)
– NIST
Talk outline
Overview
Motivation
The Java programming model
Multithread programming in Java
Data Parallelism in Java
The “Grid” High Performance Computing
Numerical Computing in Java
Conclusion
Performance of RMI protocols
1630
– JDK serialization mechanism
RPC conventional
1500
1311
Sun JDK 1.1.4
Latency (microsec.)
1250
Latency (microsec.)
1000
Sun JIT 1.2
720
750
500
250
0
Reasons for RMI overhead:
– stream management and
data copying to external
buffers
– method dispatch and low
level network
communications
JavaParty+KaRMI
+optimizations
The overhead is currently in the
range of 0.4 to 2.2 ms.
Serialization can take up to 65%
of time in slower JDKs
228
Java performance
SciMark on 500 Mhz PIII
58
60
using full optimization
C Borland 5.5
Mflops.
50
40
45
Performance greatly varies across
Latency (microsec.)
42
40
computing platforms: highest
MS VC++ 5.0
scores on Intel, AMD Athlon;
30
Java Sun 1.2
20
Java MS 1.1.4
10
Competitive with C compilers,
lowest ones on Ultra SPARC 50,
SGI MIPS, AlphaEV6
Mainly depends on the
implementation technology of the
0
JVM
Synoptic comparison of proposals
L
E
C
M
S
L
P
S
N
RMI
Low
No
No
Yes
TCP/IP
Yes
Sun
USA
JavaParty
Medium
No
Yes
Yes
TCP/IP
Yes
Sun
D
SPMD Mediumlow
No
No
Yes
TCP/IP
Yes
Does not
apply
USA
JDSM
High
Yes
Possible
Yes
Several
Yes
Socket
Interface
JP
Manta
High
Yes
No
No
Own
Protocol
Yes
Own
Protocol
NL
L: Language
E: Efficiency
C: Code
optimization
M: Object
migration
S: Standardization
L: Low level
communications
S: Serialisation
protocol
P: Polymorphism N: Nationality
of research
Summary & Conclusions
The standard constructs provided by Java (monitor/thread
model) for multithreading have been identified as a serious
hindrance to developing reliable parallel distributed software
Integration of communication frameworks (MPI, RMI, DSM,
etc.) is necessary to cover the range of portability,
uploading, soft install, migration, etc. of today applications
Possible solutions take advantage of existing Java
framework for the development and integration of many
“Grid” applications
A variety of solutions aimed at computational performance
for High Performance Computing necessities: Data
Parallelism, Grid applications and Numerical Computing
have been reviewed
References and further addresses
“Multi-Paradigm Communications in Java for Grid
Computing”. CACM, 44, 10, pp.118-124, 2001.
By V.Getov, G.Laszewski, M.Philippsen, I.Foster.
“Java and Numerical Computing”. IEEE Computing
Science and Engineering, 3, 2, pp.18-24, 2001.
By R.Boisvert, J.Moreira, M.Philippsen, R.Pozo
The Java Grande Community Official page
http:// www.javagrande.org
Numeric Class libraries
http:// Math.nist.gov/javanumerics
Personal page of M. Philippsen at Karlsruhe
http:// wwwipd.ira.uka.de
The Manta project
http://www.cs.vu.nl