

Buy anything from 5,000+ international stores. One checkout price. No surprise fees. Join 2M+ shoppers on Desertcart.
Desertcart purchases this item on your behalf and handles shipping, customs, and support to Spain.
Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language. Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing. Use the IPython interactive shell as your primary development environment Learn basic and advanced NumPy (Numerical Python) features Get started with data analysis tools in the pandas library Use high-performance tools to load, clean, transform, merge, and reshape data Create scatter plots and static or interactive visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Measure data by points in time, whether it’s specific instances, fixed periods, or intervals Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples Review: End of 2013 Kindle Update - End of 2013 Kindle Update--> Many ebooks (not just Kindle) have problems with math formulas in LaTex. Others (like this) have code or pseudocode, and lots of tables, which are problematic at times. IF you get this book for Kindle in 2014 or late 13, you are in for a treat: not only the online goodies, but the entire ebook itself has been extensively revised for Kindle, including code and tables. They are outstanding! Our previous Kindle edition wasn't awful, but this is just awesome now. If you're tired of R glitches and complexity, consider the many new (and FREE) features Wes details in this fine text, especially tips for free libraries and APIs, including of course NumPy and others that used to require a lot more math than they do today. Wes even has many plug and plays, and if you have even beginning skills in any oop (Java/C#), this will be easier than starting R from scratch. It has "nearly" the stats of R, and much more on all kinds of big data, not just research data. Highly recommended for my fellow kindlers. The native object- recursion in Python apis is alone worth this compared to R functional workarounds, even though I use both. Prior to this book, you'd be spending a LOTof time putting all this together visiting forums, libraries and APIs online. IMPORTANT NOTE on previous negative reviews: This new update not only fixes many issues with the tables for Kindle, but as you probably know if you're a Panda person, the online functional documentation for the library has been massively updated between late '12 and late '13. The author (the creator of the API) takes advantage of this with the kindle links. This makes this book a MUCH better reference than the last edition, part due to the new update and part to the value of the community work on functions, types and methods, which of course this author often leads. SO, even if you have an older version of print, many of the deficiencies (that frankly were not this author's fault!) are gone there too, because the links are still active and much better for configuring your code than before. This still isn't meant to be a "documentation" book, but, with the newly updated links, there are few programs you can't build now, including a LOT more detail on the functions themselves, with good keywords to augment the already fine examples and exercises here. Also, much less "heavy" from a programming view than most O'reilly tomes-- this author obviously understands beginners, and though this is not a how to learn Python book, it IS now a much better how to pick up DA, including pandas, numpy and other plug ins. Review: A great introduction to Python's data analysis libraries - Wes McKinney provides an introduction to the most popular and critical libraries for doing data analysis with the Python language. The book does not delve into much for advanced data analysis (statistical methods for example), but provides an excellent starting point for understanding the main tools and a strong tour to what can be done with Python in the data analysis field. The text focuses strongly on the pandas library which is used for the actual data manipulation, but provides a strong introduction to NumPy, matplotlib, and the IPython environment which is used by most of the Python data analysis community. I would have liked to see stronger coverage of SciPy and at least a chapter devoted to the statsmodels library, both of which are mentioned, but not discussed in great detail. These libraries deal with statistical methods and advanced analysis, whereas the focus of this text seems to be more on preparing data for these sorts of topics. A sequel covering these advanced topics would be greatly welcome, whereas they are probably beyond the scope of this particular text, and would add too much length. There have been some api changes since the book was published which do affect some of the examples provided but they all seem to be identified in the book's errata. The only issue that I didn't find identified is a change in how pandas' DataFrame objects are displayed in an interactive environment. This means that some of the outputs will look different than what is shown in the book, but pandas does provide an option to restore the older behavior shown. Readers should already be familiar with the Python language. Some negative reviews stress that the book does not teach Python, but that is not really it's intent. An appendix introducing the Python language is provided, but the language cannot be taught in that short of space. The purpose of the text is to introduce the reader to a specific use of Python, and does well in that case. This text was my first introduction to these libraries and I have always used R for any data analysis work. I was impressed with how similar these are to using R, and many of the libraries feel strongly like R written in Python. Readers already familiar with R will have no problem following along in the examples and will likely pick up the material very quickly. For readers new to data analysis there will likely be a steeper learning curve, but McKinney does provide excellent and detailed examples that should allow those readers to pick up the material quickly as well. Other than the api changes, which always will present an unavoidable issue with any text in the subject (especially as pandas seems to be evolving very quickly), and the lack of coverage of some essential data analysis libraries, the book is strongly recommended for anyone wishing to start using the Python language for data analysis. (I received an electronic copy of the book as part of the O'Reilly reader review program, but was impressed enough to purchase a printed copy.)















| Best Sellers Rank | #1,270,800 in Books ( See Top 100 in Books ) #513 in Data Processing #1,052 in Python Programming #1,103 in Computer Programming Languages |
| Customer Reviews | 4.2 out of 5 stars 336 Reviews |
P**Z
End of 2013 Kindle Update
End of 2013 Kindle Update--> Many ebooks (not just Kindle) have problems with math formulas in LaTex. Others (like this) have code or pseudocode, and lots of tables, which are problematic at times. IF you get this book for Kindle in 2014 or late 13, you are in for a treat: not only the online goodies, but the entire ebook itself has been extensively revised for Kindle, including code and tables. They are outstanding! Our previous Kindle edition wasn't awful, but this is just awesome now. If you're tired of R glitches and complexity, consider the many new (and FREE) features Wes details in this fine text, especially tips for free libraries and APIs, including of course NumPy and others that used to require a lot more math than they do today. Wes even has many plug and plays, and if you have even beginning skills in any oop (Java/C#), this will be easier than starting R from scratch. It has "nearly" the stats of R, and much more on all kinds of big data, not just research data. Highly recommended for my fellow kindlers. The native object- recursion in Python apis is alone worth this compared to R functional workarounds, even though I use both. Prior to this book, you'd be spending a LOTof time putting all this together visiting forums, libraries and APIs online. IMPORTANT NOTE on previous negative reviews: This new update not only fixes many issues with the tables for Kindle, but as you probably know if you're a Panda person, the online functional documentation for the library has been massively updated between late '12 and late '13. The author (the creator of the API) takes advantage of this with the kindle links. This makes this book a MUCH better reference than the last edition, part due to the new update and part to the value of the community work on functions, types and methods, which of course this author often leads. SO, even if you have an older version of print, many of the deficiencies (that frankly were not this author's fault!) are gone there too, because the links are still active and much better for configuring your code than before. This still isn't meant to be a "documentation" book, but, with the newly updated links, there are few programs you can't build now, including a LOT more detail on the functions themselves, with good keywords to augment the already fine examples and exercises here. Also, much less "heavy" from a programming view than most O'reilly tomes-- this author obviously understands beginners, and though this is not a how to learn Python book, it IS now a much better how to pick up DA, including pandas, numpy and other plug ins.
M**W
A great introduction to Python's data analysis libraries
Wes McKinney provides an introduction to the most popular and critical libraries for doing data analysis with the Python language. The book does not delve into much for advanced data analysis (statistical methods for example), but provides an excellent starting point for understanding the main tools and a strong tour to what can be done with Python in the data analysis field. The text focuses strongly on the pandas library which is used for the actual data manipulation, but provides a strong introduction to NumPy, matplotlib, and the IPython environment which is used by most of the Python data analysis community. I would have liked to see stronger coverage of SciPy and at least a chapter devoted to the statsmodels library, both of which are mentioned, but not discussed in great detail. These libraries deal with statistical methods and advanced analysis, whereas the focus of this text seems to be more on preparing data for these sorts of topics. A sequel covering these advanced topics would be greatly welcome, whereas they are probably beyond the scope of this particular text, and would add too much length. There have been some api changes since the book was published which do affect some of the examples provided but they all seem to be identified in the book's errata. The only issue that I didn't find identified is a change in how pandas' DataFrame objects are displayed in an interactive environment. This means that some of the outputs will look different than what is shown in the book, but pandas does provide an option to restore the older behavior shown. Readers should already be familiar with the Python language. Some negative reviews stress that the book does not teach Python, but that is not really it's intent. An appendix introducing the Python language is provided, but the language cannot be taught in that short of space. The purpose of the text is to introduce the reader to a specific use of Python, and does well in that case. This text was my first introduction to these libraries and I have always used R for any data analysis work. I was impressed with how similar these are to using R, and many of the libraries feel strongly like R written in Python. Readers already familiar with R will have no problem following along in the examples and will likely pick up the material very quickly. For readers new to data analysis there will likely be a steeper learning curve, but McKinney does provide excellent and detailed examples that should allow those readers to pick up the material quickly as well. Other than the api changes, which always will present an unavoidable issue with any text in the subject (especially as pandas seems to be evolving very quickly), and the lack of coverage of some essential data analysis libraries, the book is strongly recommended for anyone wishing to start using the Python language for data analysis. (I received an electronic copy of the book as part of the O'Reilly reader review program, but was impressed enough to purchase a printed copy.)
T**Y
This is an excellent book, assuming you like the author's approach to ...
This is an excellent book, assuming you like the author's approach to computation/data. This is in a sense also a review for pandas. I should emphasize I am NOT a programmer, in the proper sense. I am a (computational) physicist and have transitioned all of my data storage/analysis to pandas, for reasons I'll explain. A large portion of my work is "exploratory", where I try out many different ideas, hoping something sticks. I've wasted a large amount of time hacking away trying to piece together a somewhat complicated calculation on fairly abstract data sets, only to eventually lose track of what physics I'm trying to do because of how sloppy things get. Again, I am not a programmer! Computation is a tool to me, and time spent trying to make a tool work is time away from the actual job. Though I have only used it in earnest for a few months now, Pandas has increased my productivity tremendously. The organization/philosophy behind the program is amazing. Often (though less and less, thankfully) I find myself reverting to my old habits of working with a sloppy mixture of dicts,np.arrays, and classes, because I feel like I can do it faster/easier than setting it up in pandas. I am never right. Once I set up the problem in pandas, everything I could possibly want to do flows naturally. If you work with physical data and perform relatively complex calculations/transformations on it, I strongly recommend pandas and this book. Regarding the book, I will only say that by reading the author's (of the code and book!) perspective you quickly gain an appreciation for how powerful pandas can be.
D**C
FOR ANYONE WHO HAS TO MINE DATA
I had to learn Python quickly for a project that involved reading data from databases and web services and manipulating it and eventually storing it. This book saved me by introducing me to pandas. I fell in love immediately and ended up doing the project in Python using SQL Alchemy and pandas Data Frames. Since pandas also relies on SQL Alchemy, I suspect I could have done it with pandas alone. Wes McKinney is clearly a data geek. His examples are a bit harder to follow than those of other writers, but the depth of his knowledge -- both in problem solving and using Python to do it -- makes taking the effort to follow worth it a thousand fold. He covers everything from accessing data from numerous types of sources and walks you through solving really nasty data problems using simple tools. I found his writing clear, though probably not concise. No matter, he quickly gave me what I needed.
B**M
Excellent book -- highly recommended
This was a much needed book. Kudos to Wes Mckinney for starting Pandas and then writing a book about using it. There is a lot of public documentation/videos on Pandas, but this one does a great job of introducing the many ways Pandas can be used. After going back & forth on developing my own 'Table' structure, I decided on Pandas. As a lot of people have commented on Pandas, the API's look straightforward and simple until you start trying to do something useful. My point is, the experience gets a lot better and this book goes a long way towards that goal. The author does try to showcase how he would use Python (in general) and that can be useful for improving python skills. There are also some comments and explanation about speed/performance, with some discussion on numpy so it's useful primer, especially if you want to think about modifying numpy/pandas or developing your own high perf Python module. This book does assume that you are not a Python beginner but chances are you are not, if you are looking into Pandas. It is part-cookbook, part tutorial. I've read thru the book once, and then found focus chapters to be particularly useful when working on specific areas of my application. Beware that Pandas is still under active development, and the author does a pretty good job of pointing out potential issues, but there are some bugs that get fixed with multiple releases so some instructions may not work perfectly. [All examples are probably okay though, at the time of writing & I don't think there should be any concern with those]. As far as organization goes, the book jumps around a little across topics but that cannot be avoided since they're so interrelated. No points off for that. -1 star for two reasons: a) Would have liked to have seen some more examples where people might want to use pandas in slightly different contexts although that will probably be fixed easily in the future. b) On the topic of performance, the author has clearly shown that pandas is high performance when it comes to fairly large data, BUT it is typically slower when working with small quantities that are better handled in list-of-list routines. I profiled a few of the functions as I was converting my own date parsers & list-of-list/dict-of-list structures and found pandas slightly slower in many cases. The same is true of numpy as well, as is anything that incurs the overhead of C-function calls but some ideas about breakeven quantities would be nice. Perhaps this can be handled in the next edition too. Overall, I recommend the book highly and belongs on the bookshelf of any active python developer who deals with in-memory data. For applications in econometric & financial analysis, this book is essential!
K**P
Finally, I can ditch R and use a language I love
I've been using Python as my primary language for 10 years on and off, but have been shackled to R for any statistics or graphing for lack of knowledge of the scientific Python environment and no clear place to learn it all. I knew Python would be perfect for data analysis, but never knew where to begin. Because of this book, I can finally say that I am completely R free and loving it! The book is incredibly well written by the guy that developed the pandas library. He brings his practical data analysis experience into this text and it shines through. Each chapter takes you through the core libraries and tools that you'll need to conduct real data analysis from beginning to end. He is especially sensitive to the realities of handling real world data, which is often messy and needs to be massaged into a usable form, and which Python and its libraries are ridiculously good at handling. The introduction to iPython is perfect for anyone coming from MATLAB/R/etc. that has been missing a lot of the interactive features that those languages offer by default. You should have at minimum an introductory understanding of Python and statistics, which you likely have if you're the kind of person that would think to pick this book up in the first place. Other than that, this book will teach you how to conduct data analysis in the best possible way with the best possible language.
J**N
Good for beginners. Faint printing problem. Somewhat outdated in 2016.
Pretty good resource for a beginner to Python's data analysis libraries. This book is not for a Python beginner. Although Wes McKinney wrote Pandas, I feel that the Pandas part of the book is somewhat outdated or redundant in 2016. There is an excellent online resource maintained by Wes and other key contributors that is up-to-date, and, in my opinion, has better content for beginners learning Pandas. Because of several different contributors' perspectives, I find the examples and explanations better than those in the book. In 2012, when this book was published, that online resource may not have been as good. I found the Pandas Time Series and Financial Applications chapters interesting, but they are also replicated on the website. So the book was improved upon by the author's own website (with the help of other contributors). :-) See[...] In particular, see sections: Tutorials, Intro to Data Structures - Series and DataFrame, and Essential Basic Functionality. The remaining 1/4 of the book had very useful concentrated intro to NumPy, Advanced NumPy, and Python Essentials reference. This book does not cover the newer development of R function calls from Python. In my opinion, R is winning the R vs Pandas argument due to ggplot2 and statistical learning professors publishing code first in R. Since R is now easy to use from within Python, Pandas might not get as much use. But it's still useful to know how to use Pandas as part of a data analyst's toolkit. I also want to warn buyers about faint printing on several physical copies of this book. I bought from Amazon AND directly from O'Reilly Media in trying to get a physical book that had good, solid printing on all pages. This was not possible. The physical book from O'Reilly had even fainter/worse printing than the version I got from Amazon. Better to save your money and just get with the eBook version if you are OK with that, which you can usually find cheaper online. O'Reilly puts on excellent conferences, but may be getting out of the printed book business. I guess most programmers buy eBooks now. I just find eBooks difficult to deal with when it comes to dense, technical books. I am fine with eBooks for fiction or more narrative non-fiction such as economics, popular science, or history.
R**M
Python for Data Analysis
I first became aware of Wes McKinney's Pandas Python based data analysis software when I saw a video's of his 3 hour tutorial presentation at PyCon in March 2012. [...] I was literally blown away with the ease in which he was able to manipulate and analyze and plot/graph results. I have been using Python, a data base along with a little R and excel for analysis and this appeared to be a new level of capability. I installed Pandas several months ago but needless to say with any set complex tools a novice user isn't going to become fluent with Pandas without a significant learning curve. The online Pandas documentation is excellent with many examples. When this book became available I pre ordered it before publication. Although I am only a third of the way thru the book I am ready to highly recommend it. I have been working thru his example problems using the downloadable data he provides on a website. Although one can just start using Pandas with your own problems working thru Wes's example problems is a neat way to learn the data analysis capabilities of a combination of Python the core language, Numpy numerical analysis, Mathplotlib for plotting, and IPython the user environment. The open source software is actually becoming better and even as well documented as the very expensive commercial equivilents. A minor problem is that I never was able to get IPython notebook running under windows 7 64bit, but the QT console has been more than sufficent.
S**H
For the technical hands on developer getting into data analysis
If you needed one book to guide you through pandas for data analysis and also the underlying numpy (and more), this is it. While reading the book you cannot help but admire the comprehensive way in which the author guides you through the tool and the problems that it solves. Also thanks to the author for creating this wonderful tool!
O**D
Sehr verständlich & gut aufgebaut
Meine bisherige Erfahrung mit O'Reilly Büchern war eher durchwachsen: Oft werden entweder sehr starke Grundlagen vorausgesetzt oder es geschehen Sprünge zwischen Kapiteln die kaum nachzuvollziehen sind. Nicht so in diesem Buch. Wes McKinney hat es wirklich geschafft einen sorgfältigen Aufbau ohne Lücken zu Papier zu bringen, ohne dabei in Details unterzugehen. Das Buch beginnt mit NumPy um Grundlagen zu schaffen, erklärt sehr sorgfältig die Datenformate wie Dataframes und arbeitet sich langsam aber sicher zu recht komplexen Themengebieten hoch. Kann das Buch wirklich nur empfehlen - auch als Nachschlagewerk.
A**X
Util pero poc didactic
Presenta les estructures Panda, que están molt bé, però el libre es perd masa explican arrays i a l'hora d'entrar en matèria amb els Pandas es queda un xic curt.
L**N
Amazing
This book has taught me a lot. I always find better answers to my problems in here than in all the stack overflow posts I've come across. Very well written and informative. Joy to read.
M**C
Perfect for Data Mining
A nice and quite complete introduction to Pandas Analysis and all the goodies it provides. It also provides tricks for data mining.
Trustpilot
1 week ago
2 weeks ago