Python special features
Thing I find interesting about Python.
- 'For' is an iterator, not a loop
- Functions can be passed like variables
- Working on complex datasets at scale is easier, thanks to the data structures and built-in functions
- The libraries are insanely good. Data visualization (Matplotlib), data analysis (Panda), multi-dimensional and matrix data structures (NumPy), scientific stuff(SciPy).
Tools to practise:
- I used google collab with python 3.10. Saves the headache of setting up the IDE. Works directly in Google drive. Comes pre-loaded with standard libraries. You can also install libraries of your choice although they will get deleted in 24 hours. (note: if you can't find collab in google drive when creating a new doc, install or just search for Google Collab
- Read Collab tricks here
- I saved my notes and lessons-learnt on this page and my code in the collab doc -> vikrant payal python practice.ipynb.
Study of data structures in Python
Studied python data structures to some depth as part of the AI MLOps course at IISc. Python datatypes studied - list, dict, tuples, set, sequences.
Notes on Python's List sequence type
Used for similar datatypes. Can add or remove items easily.
- sort vs. sorted.
sort is a method. It sorts the list (the original data structure).
The code above sorts the list 't' in place. The list is changed forever. Sorted is different. It is a function. It returns a sorted list. Here's an example.
The list t1 in the above example remains unchanged.
Notes on Python's Dictionary data structure type
Used for storing key-value pairs.
- The basics. You create a key-value pair using this syntax.
A library that makes it easy to read, write data stored in different file formats and manipulate it in bulk. This is a favorite for CSVs.
A great place to learn the basics of Pandas is The gentle introduction to pandas by Rob Mulla.
The two main structures that pandas works on are called Series and Dataframes.
Example of initializing a Series
Example of initializing a Dataframe
The column name argument is optional. Defaults to 0,1,2.. if not specified. The first column (numbers) is the index automatically assigned by the Series/Dataframe.
Some keyboard shortcuts and tricks to improve your google collab experience.
To run unix commands, add a code snippet and start the command with an exclamation mark.
This returns something like:
- You can turn on automatic line numbering for each code block. Click on the gear icon, then Editor, then show line numbers. You can also set your favorite indent lenght here.
- Hitting Ctrl+Enter or Cmd+Enter runs the code block.
- Cmd + M + D deletes the code block.