A 16-core processor with both message-passing and shared-memory inter-core communication mechanisms is implemented in 65 nm CMOS. Message-passing communication is enabled in a 36 Mesh packet-switched network-on-chip, and shared-memory communication is supported using the shared memory within each cluster. The processor occupies 9.1 and operates fully functional at a clock rate of 750 MHz at 1.2 V and maximum 800 MHz at 1.3 V. Each core dissipates 34 mW under typical conditions at 750 MHz and 1.2 V while executing embedded applications such as an LDPC decoder, a 3780-point FFT module, an H.264 decoder and an LTE channel estimator. Index Terms—Chip multiprocessor, cluster-based, FFT, H.264 decoder, inter-core communication, inter-core synchronization, LDPC decoder, LTE channel estimator, message-passing, multi-core,