Mehul A. Shah

[ Bio | Resume | Papers | Thesis ]

I am currently pursuing what I love to do at Aryn.

In my spare time, I audit the world's best sorting algorithms and platforms as a member of the Sort Benchmark committee.

I have a personal blog where I post random musings on technology and life. I also have a family blog where I write about recent events in my family.


In the past decade or more, two technology trends have intersected: the cloud, with its abundance of on-demand, computing resources, and the ubiquity of data. This makes it cheap to learn from data and makes previously intractable problems feasible. In my work, I have leveraged this to build more efficient, smarter, and easier to use cloud data systems.

I'm currently focusing on the most important things in life: family and pursuing my passions in cloud and data at Aryn.

At Google, I was VP of Engineering for Streams and Lakes - the data integration, streaming, and open source analytics services on Google Cloud.

At AWS, I ran Search Services which includes Amazon OpenSearch Service (successor to Amazon Elasticsearch), Open Distro, and Amazon CloudSearch. I also launched and ran two fast-growing cloud services, AWS Lake Formation and AWS Glue, and managed engineering teams in Amazon Redshift.

Prior to Amazon, I was co-founder and CEO of Amiato (2011-2014), a managed ETL service in the cloud (acquired by Amazon). From 2004-2011, I was a principal scientist at HP Labs where my work spanned large-scale data management, distributed systems, and energy-efficient computing. This work has been published in top-tier database and systems conferences and has won several awards. Prior to HP, I received my PhD from U.C. Berkeley (2004) for adding parallelism, fault-tolerance, and load-balancing to the TelegraphCQ data-stream processing system. In 1999, I worked on the IBM DB2/UDB database. I received an MEng in 1997 and BS in Computer Science and Physics in 1996, all from MIT. In my spare time, I serve on the Sort Benchmark committee.


The best summary of my career work is here.


This list is out of date. Please see my DBLP entry for a complete list of my publications.


Analyzing consistency properties for fun and profit.
Wojciech M. Golab, Xiaozhou Li, Mehul A. Shah.
PODC 2011.


What consistency does your key-value store actually provide?
Eric Anderson, Xiaozhou Li, Mehul A. Shah, Joseph Tucek, and Jay J. Wylie.
HotDep 2010.

Efficient eventual consistency in Pahoehoe, an erasure-coded key-blob archive.
Eric Anderson, Xiaozhou Li, Arif Merchant, Mehul A. Shah, Kevin Smathers, Joseph Tucek, Mustafa Uysal, Jay J. Wylie.
DSN 2010.

Analyzing the energy efficiency of a database server.>
Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah.
SIGMOD 2010.


Sinfonia: A new paradigm for building scalable distributed systems.
Marcos K. Aguilera, Arif Merchant, Mehul A. Shah, Alistair C. Veitch, Christos T. Karamanolis.
ACM Trans. Comput. Syst. 27(3): 2009.

Tracking the Power in an Enterprise Decision Support System.
Justin Meza, Mehul A. Shah, Parthasarathy Ranganathan, Mike Fitzner, and Judson Veazey.
International Symposium on Low Power Electronics and Design (ISLPED), August 2009.
This is not available elsewhere on the ISLPED site, so feel free to link here.

Query Processing Techniques for Solid State Drives.
Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah, Janet L. Wiener, and Goetz Graefe.
ACM SIGMOD, July 2009.

Operating System Support for NVM+DRAM Hybrid Main Memory.
Jeffrey C. Mogul, Eduardo Argollo, Mehul A. Shah, Paolo Faraboschi.
HotOS XII, May 2009.

Energy Efficiency: The New Holy Grail of Data Management Systems Research.
Stavros Harizopoulos, Mehul A. Shah, Justin Meza, Parthasarathy Ranganathan.
Conference on Innovative Data Systems Research (CIDR), January 2009.


A Pratical Scalable Distributed B-Tree.
Marcos K. Aguilera, Wojciech Golab, and Mehul A. Shah.
International Conference on Very Large Data Bases (VLDB), August 2008.


Sinfonia: A New Paradigm for Building Scalable Distributed Systems.
Marcos K. Aguilera, Arif Merchant, Mehul A. Shah, Alistair Veitch, and Christos Karamanolis.
ACM Symposium on Operating Systems Principles (SOSP), October 2007. Best paper.

JouleSort: A Balanced Energy-Efficiency Benchmark.
Suzanne Rivoire, Mehul A. Shah, Parthasarathy Ranganathan, and Christos Kozyrakis.
ACM SIGMOD, June 2007.

Auditing to Keep Online Storage Services Honest.
Mehul A. Shah, Mary Baker, Jeffrey C. Mogul, and Ram Swaminathan.
HotOS XI, May 2007.


Pip: Detecting the Unexpected in Distributed Systems.
Patrick Reynolds Charles Killian, Janet L. Wiener, Jeffrey C. Mogul, Mehul A. Shah, and Amin Vahdat.
Symp. on Networked Systems Design and Implementation (NSDI), , May 2006.

A Fresh Look at the Reliability of Long-term Digital Storage.
Mary Baker, Mehul A. Shah, David S. H. Rosenthal, Mema Roussopoulos, Petros Maniatis, TJ Giuli, and Prashanth Bungale.
EuroSys, April 2006.

IT Infrastructure in Emerging Markets: Arguing for an End-to-End Perspective.
Ajay Gupta, Parthasarathy Ranganathan, Prashant Sarin, Mehul Shah
IEEE Pervasive Computing, April-June 2006.


Mehul A. Shah, Joseph M. Hellerstein and Eric Brewer
Highly-Available, Fault-Tolerant, Parallel Dataflows , SIGMOD, June 2004. [PDF]

Mehul A. Shah, Joseph M. Hellerstein, Sirish Chandrasekaran and Michael J. Franklin
Flux: An Adaptive Partitioning Operator for Continuous Query Systems, ICDE, March 2003. [PS] [PDF]
Longer, more complete technical report: [PDF]

Sailesh Krishnamurthy, Sirish Chandrasekaran, Owen Cooper, Amol Deshpande,
Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Samuel R. Madden, Fred Reiss, Mehul Shah
TelegraphCQ: An Architectural Status Report, IEEE Data Engineering Bulletin, March 2003.

Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein,
Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Vijayshankar Raman, Fred Reiss, and Mehul A. Shah.
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World, In 1st CIDR Conf., Jan 2003

Samuel R. Madden, Mehul A. Shah, Joseph M. Hellerstein and Vijayshankar Raman
Continuously Adaptive Continuous Queries over Streams, SIGMOD Conference, 2002.

Mehul A. Shah, Samuel R. Madden, Michael J. Franklin and Joseph M. Hellerstein
Java Support for Data-intensive Systems: Experiences Building the Telegraph Dataflow System, SIGMOD Record, December 2001. [PS] [PDF]

Marcel Kornacker, Mehul A. Shah and Joseph M. Hellerstein
Amdb: A Design Tool for Access Methods, Technical Report. UC Berkeley. [PDF]

Mehul A. Shah, Marcel Kornacker and Joseph M. Hellerstein
Amdb: A Visual Access Method Development Tool, Proc. User Interfaces to Data Intensive Systems (UIDIS), September 1999. [PS] [PS.GZIP]

Henry Kautz, Bart Selman, and Mehul Shah.
ReferralWeb: Combining Social Networks and Collaborative Filtering. CACM 40(3): 63-65 (1997).

Henry Kautz, Bart Selman, and Mehul Shah.
The Hidden Web. AI Magazine 18(2): 27-36 (1997).


Mehul A. Shah
Flux: A Mechanism for Building Robust, Scalable Dataflows U.C. Berkeley, PhD Thesis, Oct. 2004. [PDF]

Mehul A. Shah
ReferralWeb: A Resource Location System Guided by Personal Relations, M.I.T. MEng Thesis, May 1997. [PDF]
This was the first work that presented techniques for automatically extracting social networks from the web. Although I am the sole author (as required for all theses), this work was done jointly with Henry Kautz and Bart Selman while I was at AT&T Bell Labs. My thesis advisor (and collaborator) at MIT was David Karger .