1. 基本概念
- ORM:对象关系映射,Object Relational Mapping。它的作用是在关系型数据库和对象之间作一个映射。不需要复杂的 SQL 语句,操作数据如同操作对象一样简单。
- QuerySet:给定模型的对象列表。QuerySet 允许从数据库中读取数据,对其进行筛选、排序等操作。
- Manager:django.db.models.manager.Manager,Django 用于表级功能的操作类。每个 model 都有一个默认的 Manager 实例,叫做 objects。
2. QuerySet
Django ORM 用到三个类:Model、Manager、QuerySet。Model 是数据模型。Manager 定义表级方法,当需要定制时,以 models.Manager 为父类,定义自己的 Manager 类,增加表级方法。而这部分主要讨论的 queryset 就是 Manager 类的一些方法会返回的 QuerySet 实例,QuerySet 是一个可遍历结构,包含一个或多个元素,每个元素都是一个 Model 实例,它里面的方法也是表级方法。
1
2
3
|
In [1]: bs = Basket.objects.values_list('weight','create_time')
In [2]: bs
Out[2]: [(2.1, datetime.datetime(2017, 8, 15, 20, 14, 9))]
|
1
2
3
|
In [1]: b = Basket.objects.values('weight','create_time')
In [2]: b
Out[2]: [{'create_time': datetime.datetime(2017, 8, 15, 20, 14, 9), 'weight': 2.1}]
|
extra 中可实现别名、条件、排序等,后面两个用 filter、exclude 可以实现,排序用 order_by 可以实现。这里主要看一下别名功能:比如 Basket 中有 weight,需要重命名为 w。
1
2
3
|
In [1]: b =Basket.objects.all().extra(select={'w':'weight'})
Out[1]: b[0].w
2.1
|
以求和为例:
1
2
3
|
In [1]:from django.db.models import Sum
In [2]:Basket.objects.values('weight').annotate(sum_weight=Sum('weight'))
Out[2]:[{'sum_weight': 2.1, 'weight': 2.1}]
|
select_related
优化一对一,多对一查询
一次查询,将外键数据获取到。
1
2
3
|
In [1]:b = Basket.objects.all().select_related('fruit')
In [2]:b[0].fruit.name
Out[2]: u'apple'
|
prefetch_related
优化一对多,多对多查询
prefetch_related
用于一对多、多对多 的情况,这时 select_related
用不了,因为当前一条有好几条与之相关的内容。prefetch_related
是通过再执行一条额外的SQL语句,然后用 Python 把两次SQL查询的内容关联(joining)到一起。
在复杂的情况下,表中可能有些字段内容非常多,取出来转化成 Python 对象会占用大量的资源,defer 可以排除掉部分字段。
和 defer 相反,only 用于取出需要的字段。
django.db.models 中有 Count, Avg, Sum 等。但是,有一些没有,比如 GROUP_CONCAT。这可以自定义 GroupConcat 类来实现相关功能。
当遍历 queryset 时,所有匹配的记录会从数据库获取,然后转换成 Django 的 model 。这些 model 会保存在 queryset 内置的 cache 中,这样如果再次遍历这个 queryset,不需要重复运行通用的查询。
1
2
3
4
5
6
7
|
b_set = Basket.objects.all()
# The query is executed and cached.
for b in b_set :
print(b.create_time)
# The cache is used for subsequent iteration.
for b in b_set :
print(b.weight)
|
3. Django 常用操作对应的 SQL 语句
首先,新建两个 models:
1
2
3
4
5
6
7
8
|
class Fruit(models.Model):
name = models.CharField(u'名称', default="", max_length=255)
price = models.FloatField(u"单价", default=0)
class Basket(models.Model):
create_time = models.DateTimeField(u'新增时间', auto_now_add=True)
fruit = models.ForeignKey(Fruit)
weight = models.FloatField(default=0.0)
|
Django 提供了 Shell 调试环境,输入命令:
可以进入 Console ,使用命令行操作 Django DB。
1
2
3
4
5
6
7
8
9
|
In [1]:from home_application.models import Basket
In [2]:Basket.objects.all()
Out[2]: [<Basket: Basket object>]
In [3]:from django.db import connection
In [4]:connection.queries
Out[4]:
[{u'sql': u'SET SQL_AUTO_IS_NULL = 0', u'time': u'0.001'},
{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` LIMIT 21',
u'time': u'0.002'}]
|
使用 connection.queries
可以查看到历史的 SQL 执行语句。为了仅显示当前操作的 SQL,这里每次查看 SQL 之后,使用 db.reset_queries()
清理一下 connection.queries
。还有一种方法可以获取当前 ORM 操作的 SQL 语句 print Basket.objects.all().query
。通过控制台直接打印查询集的 query 属性,输出:
1
|
SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket`
|
下面,看下常见的 Django ORM 操作生成的 SQL :
1
2
3
4
5
6
7
|
In [1]:from django import db
In [2]:db.reset_queries()
In [3]:Basket.objects.filter(weight=2.1)
In [4]:connection.queries
Out[4]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE `home_application_basket`.`weight` = 2.1 LIMIT 21',
u'time': u'0.001'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.get(weight=2.1)
In [3]:connection.queries
Out[3]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE `home_application_basket`.`weight` = 2.1',
u'time': u'0.000'}]
|
Django 中使用在字段名称后面追加 __gt
、__lt
来实现,范围查询。
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.filter(create_time__gte='1999-01-01')
In [3]:connection.queries
Out[3]:
[{u'sql': u"SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE `home_application_basket`.`create_time` >= '1999-01-01 00:00:00' LIMIT 21",
u'time': u'0.001'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.exclude(create_time__gte='1999-01-01')
In [3]:connection.queries
Out[3]:
[{u'sql': u"SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE NOT (`home_application_basket`.`create_time` >= '1999-01-01 00:00:00') LIMIT 21",
u'time': u'0.000'}]
|
1
2
3
4
5
6
7
|
In [1]:db.reset_queries()
In [2]:Basket.objects.filter(fruit__name="apple")
In [3]:connection.queries
Out[3]:
[{u'sql': u"SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` INNER JOIN `home_application_fruit` ON ( `home_application_basket`.`fruit_id` = `home_application_fruit`.`id`
) WHERE `home_application_fruit`.`name` = 'apple' LIMIT 21",
u'time': u'0.001'}]
|
1
2
3
4
5
6
7
|
In [1]:db.reset_queries()
In [2]:from django.db.models import Q
In [3]:Basket.objects.filter(Q(weight=2.1) | Q(weight=2.0))
In [4]:connection.queries
Out[4]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE (`home_application_basket`.`weight` = 2.1 OR `home_application_basket`.`weight` = 2) LIMIT 21',
u'time': u'0.001'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.filter(weight__in=[2.1, 2.0])
In [3]:connection.queries
Out[3]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE `home_application_basket`.`weight` IN (2.1, 2) LIMIT 21',
u'time': u'0.001'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.filter(weight__contains='2')
In [3]:connection.queries
Out[3]:
[{u'sql': u"SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE `home_application_basket`.`weight` LIKE BINARY '%2%' LIMIT 21",
u'time': u'0.001'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.filter(weight=2.1).count()
In [3]:connection.queries
Out[3]:
[{u'sql': u"SELECT COUNT('*') AS `__count` FROM `home_application_basket` WHERE `home_application_basket`.`weight` = 2.1",
u'time': u'0.002'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.all().order_by('create_time')
In [3]:connection.queries
Out[3]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` ORDER BY `home_application_basket`.`create_time` ASC LIMIT 21',
u'time': u'0.001'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.all().order_by('create_time', '-weight')
In [3]:connection.queries
Out[3]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` ORDER BY `home_application_basket`.`create_time` ASC, `home_application_basket`.`weight` DESC LIMIT 21',
u'time': u'0.001'}]
|
1
2
3
4
5
6
7
8
9
10
|
In [1]:db.reset_queries()
In [2]:b = Basket.objects.get(pk=1)
In [3]:b.weight = 2.0
In [4]:b.save()
In [5]:connection.queries
Out[5]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE `home_application_basket`.`id` = 1',
u'time': u'0.001'},
{u'sql': u"UPDATE `home_application_basket` SET `create_time` = '2017-08-08 18:00:59', `fruit_id` = 1, `weight` = 2 WHERE `home_application_basket`.`id` = 1",
u'time': u'0.003'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.filter(weight=2.0).update(weight=2.1)
In [3]:connection.queries
Out[3]:
[{u'sql': u'UPDATE `home_application_basket` SET `weight` = 2.1 WHERE `home_application_basket`.`weight` = 2',
u'time': u'0.003'}]
|
1
2
3
4
5
6
|
In [1]:db.reset_queries()
In [2]:Basket.objects.filter(weight=2.1).delete()
In [3]:connection.queries
Out[3]:
[{u'sql': u'DELETE FROM `home_application_basket` WHERE `home_application_basket`.`weight` = 2.1',
u'time': u'0.003'}]
|
- 提前 filter 需要处理的对象并不能减少 SQL 查询
可以看到如果不使用查询对象,不会产生 SQL 查询。仅当对查询集进行操作时,SQL 才会开始查询。将查询集保存,是为了利用内置的 Cache。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
In [1]:db.reset_queries()
In [2]:all = Basket.objects.all()
In [3]:connection.queries
Out [3]: []
In [4]: all
In [5]:connection.queries
Out [5]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` LIMIT 21',
u'time': u'0.000'}]
In [6]: all.filter(weight=2.1)
Out [6]:
[{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` LIMIT 21',
u'time': u'0.000'},
{u'sql': u'SELECT `home_application_basket`.`id`, `home_application_basket`.`create_time`, `home_application_basket`.`fruit_id`, `home_application_basket`.`weight` FROM `home_application_basket` WHERE `home_application_basket`.`weight` = 2.1 LIMIT 21',
u'time': u'0.000'}]
|
4. SQL 执行性能
除了 ID 字段,其他字段默认不建立索引。通过设置,db_index
属性,可以自行添加索引,对 filter()、exclude()、order_by()
操作会有显著性能提升,例如:models.DateField(db_index=True)。
以下七种情况, 会查询数据库并生成cache, 不用再重新连数据库进行查询
- Iteration, ie. 对Queryset进行For循环的操作.
- slicing, e.g. Entry.objects.all()[:5], 获取 queryset 中的前五个对象, 相当于 SQL 中的LIMIT 5
- picling/caching
- repr/str
- len (Note: 如果你只想知道这个queryset结果的长度的话, 最高效的还是在数据库的层级调用count()方法, 也就是sql中的COUNT(). )
- list()
- bool()比如:
1
2
3
|
>>> queryset = Entry.objects.all()
>>> print([p.headline for p in queryset]) # Evaluate the query set.
>>> print([p.pub_date for p in queryset]) # Re-use the cache from the evaluation.
|
使用 select_related()
、prefetch_related()
、 values_list()
、values()方法
- 如果查出的 queryset 只用一次, 可以使用 iterator() 去来防止占用太多的内存
- bulk(批量)地去 insert、update和delete数据
- 用 count() 代替len(queryset),用 exists() 代替if queryset
5. 参考
- https://github.com/jazzband/django-model-utils
- http://www.zlovezl.cn/articles/sqlalchemy-intro-for-django-guys/
- http://code.ziqiangxuetang.com/django/django-queryset-advance.html
- http://www.cnblogs.com/ajianbeyourself/p/3604332.html
- http://code.ziqiangxuetang.com/django/django-queryset-advance.html
- http://coolshell.cn/articles/1846.html